Oral "benchmark evaluation" Papers
2 papers found
Conference
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding
Fei Wang, XINGYU FU, James Y. Huang et al.
ICLR 2025oralarXiv:2406.09411
120
citations
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.
ICLR 2025oralarXiv:2410.23266
37
citations