"on-policy reinforcement learning" Papers
7 papers found
Conference
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
Shutong Ding, Ke Hu, Shan Zhong et al.
NEURIPS 2025arXiv:2505.18763
6
citations
Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning
Samuel Garcin, Trevor McInroe, Pablo Samuel Castro et al.
ICLR 2025arXiv:2503.06343
5
citations
Absolute Policy Optimization: Enhancing Lower Probability Bound of Performance with High Confidence
Weiye Zhao, Feihan Li, Yifan Sun et al.
ICML 2024
Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling
Jakob Hollenstein, Georg Martius, Justus Piater
AAAI 2024paperarXiv:2312.11091
8
citations
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar, Anikait Singh, Archit Sharma et al.
ICML 2024arXiv:2404.14367
179
citations
Reflective Policy Optimization
Yaozhong Gan, yan renye, zhe wu et al.
ICML 2024arXiv:2406.03678
2
citations
SAPG: Split and Aggregate Policy Gradients
Jayesh Singla, Ananye Agarwal, Deepak Pathak
ICML 2024arXiv:2407.20230
13
citations