"visual perception" Papers
6 papers found
Conference
Are Large Vision Language Models Good Game Players?
Xinyu Wang, Bohan Zhuang, Qi Wu
ICLR 2025arXiv:2503.02358
15
citations
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim, Rui Xiao, Iuliana Georgescu et al.
CVPR 2025arXiv:2412.01814
7
citations
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi, Fuxiao Liu, Shihao Wang et al.
ICLR 2025arXiv:2408.15998
116
citations
Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models
Vahid Balazadeh, Mohammadmehdi Ataei, Hyunmin Cheong et al.
ICCV 2025arXiv:2412.08619
2
citations
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.
CVPR 2025arXiv:2412.18108
18
citations
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models
Rohan Wadhawan, Hritik Bansal, Kai-Wei Chang et al.
ICML 2024arXiv:2401.13311
20
citations