"advantage estimation" Papers
2 papers found
Conference
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
NEURIPS 2025arXiv:2505.23564
18
citations
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
Andrew Jesson, Christopher Lu, Gunshi Gupta et al.
ICML 2024arXiv:2306.01460
10
citations