"optimal advantage function" Papers
2 papers found
Conference
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
NEURIPS 2025arXiv:2505.20686
12
citations
Learning Optimal Advantage from Preferences and Mistaking It for Reward
W Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson et al.
AAAI 2024paperarXiv:2310.02456
16
citations