by Zhihong Shao Papers
3 papers found
Conference
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search
Huajian Xin, Z.Z. Ren, Junxiao Song et al.
ICLR 2025arXiv:2408.08152
142
citations
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou, Zhihong Shao, Yeyun Gong et al.
ICLR 2024arXiv:2305.11738
621
citations
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Zhibin Gou, Zhihong Shao, Yeyun Gong et al.
ICLR 2024arXiv:2309.17452
272
citations