Poster "vision-language understanding" Papers
5 papers found
Conference
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.
ICLR 2025arXiv:2504.19276
11
citations
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Yuxuan Cai, Jiangning Zhang, Haoyang He et al.
ICCV 2025arXiv:2410.16236
27
citations
SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs
Jinhong Deng, Wen Li, Joey Tianyi Zhou et al.
NEURIPS 2025arXiv:2510.24214
Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning
Jiachen Li, Qiaozi Gao, Michael Johnston et al.
ICML 2024arXiv:2310.09676
17
citations
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
Swetha Sirnam, Jinyu Yang, Tal Neiman et al.
ECCV 2024arXiv:2407.13851
11
citations