Highlight "visual question answering" Papers
4 papers found
Conference
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai, Jie Zhou, Xingjiao Wu et al.
CVPR 2025highlightarXiv:2503.00413
10
citations
Scaling Language-Free Visual Representation Learning
David Fan, Shengbang Tong, Jiachen Zhu et al.
ICCV 2025highlightarXiv:2504.01017
41
citations
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan, Abdelfatah Ahmed, Mohamad Alansari et al.
CVPR 2025highlightarXiv:2504.02823
2
citations
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li, Biao Yang, Qiang Liu et al.
CVPR 2024highlightarXiv:2311.06607
392
citations