"benchmark development" Papers
5 papers found
Conference
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du et al.
ICLR 2025oralarXiv:2410.03051
105
citations
Do as We Do, Not as You Think: the Conformity of Large Language Models
Zhiyuan Weng, Guikun Chen, Wenguan Wang
ICLR 2025arXiv:2501.13381
20
citations
How efficient is LLM-generated code? A rigorous & high-standard benchmark
Ruizhong Qiu, Weiliang Zeng, James Ezick et al.
ICLR 2025arXiv:2406.06647
45
citations
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou, Yan Shu, Bo Zhao et al.
CVPR 2025arXiv:2406.04264
105
citations
Offline Multi-Objective Optimization
Ke Xue, Rong-Xi Tan, Xiaobin Huang et al.
ICML 2024arXiv:2406.03722
13
citations