Poster "policy optimization" Papers
66 papers found • Page 2 of 2
Conference
Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation
JoonHo Lee, Jae Oh Woo, Juree Seok et al.
ICML 2024arXiv:2405.06424
3
citations
Information-Directed Pessimism for Offline Reinforcement Learning
Alec Koppel, Sujay Bhatt, Jiacheng Guo et al.
ICML 2024
Iterative Regularized Policy Optimization with Imperfect Demonstrations
Xudong Gong, Feng Dawei, Kele Xu et al.
ICML 2024
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
songyang gao, Qiming Ge, Wei Shen et al.
ICML 2024arXiv:2401.11458
21
citations
Model-based Reinforcement Learning for Confounded POMDPs
Mao Hong, Zhengling Qi, Yanxun Xu
ICML 2024
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
Asaf Cassel, Haipeng Luo, Aviv Rosenberg et al.
ICML 2024arXiv:2405.07637
5
citations
Position: Automatic Environment Shaping is the Next Frontier in RL
Younghyo Park, Gabriel Margolis, Pulkit Agrawal
ICML 2024
Probabilistic Constrained Reinforcement Learning with Formal Interpretability
YANRAN WANG, QIUCHEN QIAN, David Boyle
ICML 2024arXiv:2307.07084
5
citations
Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization
Liam Schramm, Abdeslam Boularias
ICML 2024arXiv:2407.05511
1
citations
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
ICML 2024arXiv:2403.00409
103
citations
Rate-Optimal Policy Optimization for Linear Markov Decision Processes
Uri Sherman, Alon Cohen, Tomer Koren et al.
ICML 2024arXiv:2308.14642
9
citations
Reflective Policy Optimization
Yaozhong Gan, yan renye, zhe wu et al.
ICML 2024arXiv:2406.03678
2
citations
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages
Andrew Jesson, Christopher Lu, Gunshi Gupta et al.
ICML 2024arXiv:2306.01460
10
citations
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.
ICML 2024arXiv:2403.01857
20
citations
Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
Ju-Hyun Kim, Seungki Min
ICML 2024
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai, Yaodong Yang, Qian Zheng et al.
ICML 2024arXiv:2412.11138
3
citations