by Kongcheng Zhang Papers
2 papers found
Conference
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
Kongcheng Zhang, QI YAO, Shunyu Liu et al.
NEURIPS 2025arXiv:2506.08745
13
citations
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
NEURIPS 2025arXiv:2505.20347
25
citations