"visual comprehension" Papers
6 papers found
Conference
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan, Wang Lin, Zhongqi Yue et al.
CVPR 2025arXiv:2504.14666
20
citations
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
Kaihang Pan, Yang Wu, Wendong Bu et al.
NEURIPS 2025arXiv:2506.01480
7
citations
LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance
Zhang Li, Biao Yang, Qiang Liu et al.
ICCV 2025arXiv:2507.06272
1
citations
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models
Hongbo Liu, Jingwen He, Yi Jin et al.
NEURIPS 2025arXiv:2506.21356
7
citations
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
Yexin Liu, Zhengyang Liang, Yueze Wang et al.
CVPR 2025arXiv:2406.10638
19
citations
Auto-Encoding Morph-Tokens for Multimodal LLM
Kaihang Pan, Siliang Tang, Juncheng Li et al.
ICML 2024spotlightarXiv:2405.01926
32
citations