"visual understanding" Papers

10 papers found

Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval

Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.

ICCV 2025highlightarXiv:2507.23284
3
citations

ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation

Cheng Yang, Chufan Shi, Yaxin Liu et al.

ICLR 2025arXiv:2406.09961
69
citations

Evaluating Vision-Language Models as Evaluators in Path Planning

Mohamed Aghzal, Xiang Yue, Erion Plaku et al.

CVPR 2025arXiv:2411.18711
4
citations

FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities

Jin Wang, Yao Lai, Aoxue Li et al.

NEURIPS 2025spotlightarXiv:2505.20147
21
citations

UniTok: a Unified Tokenizer for Visual Generation and Understanding

Chuofan Ma, Yi Jiang, Junfeng Wu et al.

NEURIPS 2025spotlightarXiv:2502.20321
79
citations

Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach

Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.

CVPR 2025arXiv:2412.18108
18
citations

Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding

Talfan Evans, Shreya Pathak, Hamza Merzic et al.

ECCV 2024arXiv:2312.05328
25
citations

GPT-4V(ision) is a Generalist Web Agent, if Grounded

Boyuan Zheng, Boyu Gou, Jihyung Kil et al.

ICML 2024arXiv:2401.01614
424
citations

Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding

Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan-Bac Nguyen et al.

CVPR 2024highlightarXiv:2311.15206
29
citations

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Guohao Sun, Can Qin, JIAMINAN WANG et al.

ECCV 2024arXiv:2403.11299
24
citations