Poster "vision-language fusion" Papers
2 papers found
Conference
Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
Qiong Wu, Wenhao Lin, Yiyi Zhou et al.
NEURIPS 2025arXiv:2411.19628
5
citations
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.
ECCV 2024arXiv:2303.05499
3442
citations