by Jianzhu Yao Papers
2 papers found
Conference
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
Zihan Zheng, Zerui Cheng, Zeyu Shen et al.
NEURIPS 2025arXiv:2506.11928
29
citations
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially?
Jianzhu Yao, Kevin Wang, Ryan Hsieh et al.
COLM 2025paperarXiv:2503.12349
9
citations