"trust region methods" Papers
4 papers found
Conference
Learn Your Reference Model for Real Good Alignment
Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov et al.
ICLR 2025arXiv:2404.09656
50
citations
ACPO: A Policy Optimization Algorithm for Average MDPs with Constraints
Akhil Agnihotri, Rahul Jain, Haipeng Luo
ICML 2024arXiv:2302.00808
2
citations
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai, Yaodong Yang, Qian Zheng et al.
ICML 2024arXiv:2412.11138
3
citations
Trust Region Methods for Nonconvex Stochastic Optimization beyond Lipschitz Smoothness
Chenghan Xie, Chenxi Li, Chuwen Zhang et al.
AAAI 2024paperarXiv:2310.17319
14
citations