Spotlight "policy optimization" Papers
4 papers found
Conference
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang, Chongjie Si, Jun Luo et al.
NEURIPS 2025spotlightarXiv:2505.17534
5
citations
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder, Deep Tejas Karkhanis
NEURIPS 2025spotlightarXiv:2505.15201
28
citations
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.
NEURIPS 2025spotlightarXiv:2505.21908
5
citations
Right Question is Already Half the Answer: Fully Unsupervised LLM Reasoning Incentivization
Qingyang Zhang, Haitao Wu, Changqing Zhang et al.
NEURIPS 2025spotlightarXiv:2504.05812
78
citations