by Haian Huang Papers
2 papers found
Conference
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.
COLM 2025paper
44
citations
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
Junhao Shen, Haiteng Zhao, Yuzhe Gu et al.
NEURIPS 2025arXiv:2507.16814
2
citations