by Xiangyu Tian Papers
2 papers found
Conference
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu, Jiahao Lin, Xiangyu Tian et al.
COLM 2025paperarXiv:2503.12854
17
citations
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu, Jiahao Lin, Qichao Zhang et al.
NEURIPS 2025arXiv:2505.10832
39
citations