"pixel-level grounding" Papers
3 papers found
Conference
GroundingFace: Fine-grained Face Understanding via Pixel Grounding Multimodal Large Language Model
Yue Han, Jiangning Zhang, Junwei Zhu et al.
CVPR 2025highlight
1
citations
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.
CVPR 2025arXiv:2505.13788
3
citations
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Hao Zhong, Muzhi Zhu, Zongze Du et al.
NEURIPS 2025oralarXiv:2505.20256
14
citations