"math reasoning benchmarks" Papers
3 papers found
Conference
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
NEURIPS 2025arXiv:2502.20548
12
citations
Reasoning Is Not a Race: When Stopping Early Beats Going Deeper
Mohan Zhang, Jiaxuan Gao, Shusheng Xu et al.
NEURIPS 2025
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
Justin Chih-Yao Chen, Swarnadeep Saha, Elias Stengel-Eskin et al.
ICML 2024arXiv:2402.01620
28
citations