"model performance analysis" Papers
2 papers found
Conference
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Jiajun Shi, Jian Yang, Jiaheng Liu et al.
NEURIPS 2025spotlightarXiv:2505.14552
4
citations
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark for Large Language Models
Bofei Gao, Feifan Song, Zhe Yang et al.
ICLR 2025arXiv:2410.07985
149
citations