"visual understanding" Papers
10 papers found
Conference
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
ICCV 2025highlightarXiv:2507.23284
3
citations
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang, Chufan Shi, Yaxin Liu et al.
ICLR 2025arXiv:2406.09961
69
citations
Evaluating Vision-Language Models as Evaluators in Path Planning
Mohamed Aghzal, Xiang Yue, Erion Plaku et al.
CVPR 2025arXiv:2411.18711
4
citations
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang, Yao Lai, Aoxue Li et al.
NEURIPS 2025spotlightarXiv:2505.20147
21
citations
UniTok: a Unified Tokenizer for Visual Generation and Understanding
Chuofan Ma, Yi Jiang, Junfeng Wu et al.
NEURIPS 2025spotlightarXiv:2502.20321
79
citations
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.
CVPR 2025arXiv:2412.18108
18
citations
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans, Shreya Pathak, Hamza Merzic et al.
ECCV 2024arXiv:2312.05328
25
citations
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng, Boyu Gou, Jihyung Kil et al.
ICML 2024arXiv:2401.01614
424
citations
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan-Bac Nguyen et al.
CVPR 2024highlightarXiv:2311.15206
29
citations
SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
Guohao Sun, Can Qin, JIAMINAN WANG et al.
ECCV 2024arXiv:2403.11299
24
citations