Poster "trust region methods" Papers
3 papers found
Conference
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.
ICLR 2025arXiv:2404.09656
50
citations
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Akhil Agnihotri, Rahul Jain, Haipeng Luo
ICML 2024arXiv:2302.00808
2
citations
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai, Yaodong Yang, Qian Zheng et al.
ICML 2024arXiv:2412.11138
3
citations