"multimodal benchmarks" Papers
8 papers found
Conference
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Zhiyuan Liang, Dongwen Tang, Yuhao Zhou et al.
NEURIPS 2025arXiv:2506.16406
3
citations
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
AAAI 2025paperarXiv:2501.04322
13
citations
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou, Feng Hong, JIAAN LUO et al.
NEURIPS 2025arXiv:2503.22215
3
citations
LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs
Haoran Lou, Chunxiao Fan, Ziyan Liu et al.
ICCV 2025arXiv:2507.00505
MMCR: Benchmarking Cross-Source Reasoning in Scientific Papers
Yang Tian, Zheng Lu, Mingqi Gao et al.
ICCV 2025arXiv:2503.16856
2
citations
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Davide Caffagni, Sara Sarto, Marcella Cornia et al.
CVPR 2025arXiv:2503.01980
6
citations
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.
ECCV 2024arXiv:2404.01291
357
citations
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li et al.
ECCV 2024arXiv:2312.02949
114
citations