by Kuikun Liu Papers
3 papers found
Conference
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning
Chengqi Lyu, Songyang Gao, Yuzhe Gu et al.
COLM 2025paper
44
citations
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher
Zehui Chen, Kuikun Liu, Qiuchen Wang et al.
ICLR 2025arXiv:2407.20183
54
citations
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-Thinking Reasoning
Junhao Shen, Haiteng Zhao, Yuzhe Gu et al.
NEURIPS 2025arXiv:2507.16814
2
citations