Poster "multimodal inputs" Papers
4 papers found
Conference
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
Xueyang Zhou, Guiyao Tie, Guowen Zhang et al.
NEURIPS 2025arXiv:2505.16640
13
citations
IDEA-Bench: How Far are Generative Models from Professional Designing?
Chen Liang, Lianghua Huang, Jingwu Fang et al.
CVPR 2025arXiv:2412.11767
4
citations
Dolphins: Multimodal Language Model for Driving
Yingzi Ma, Yulong Cao, Jiachen Sun et al.
ECCV 2024arXiv:2312.00438
128
citations
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.
CVPR 2024arXiv:2402.19479
351
citations