Poster "benchmark development" Papers
4 papers found
Conference
Do as We Do, Not as You Think: the Conformity of Large Language Models
Zhiyuan Weng, Guikun Chen, Wenguan Wang
ICLR 2025arXiv:2501.13381
20
citations
How efficient is LLM-generated code? A rigorous & high-standard benchmark
Ruizhong Qiu, Weiliang Zeng, James Ezick et al.
ICLR 2025arXiv:2406.06647
45
citations
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou, Yan Shu, Bo Zhao et al.
CVPR 2025arXiv:2406.04264
105
citations
Offline Multi-Objective Optimization
Ke Xue, Rong-Xi Tan, Xiaobin Huang et al.
ICML 2024arXiv:2406.03722
13
citations