Spotlight by Robert Tang Papers
2 papers found
Conference
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Jiajun Shi, Jian Yang, Jiaheng Liu et al.
NEURIPS 2025spotlightarXiv:2505.14552
4
citations
SciArena: An Open Evaluation Platform for Non-Verifiable Scientific Literature-Grounded Tasks
Yilun Zhao, Kaiyan Zhang, Tiansheng Hu et al.
NEURIPS 2025spotlightarXiv:2507.01001
10
citations