Poster "policy improvement" Papers
4 papers found
Conference
When Can Model-Free Reinforcement Learning be Enough for Thinking?
Josiah Hanna, Nicholas Corrado
NEURIPS 2025arXiv:2506.17124
1
citations
Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays
Qingyuan Wu, Simon Zhan, Yixuan Wang et al.
ICML 2024arXiv:2402.03141
4
citations
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen, Chen Zhu, Jiuhai Chen et al.
ICML 2024arXiv:2402.07319
110
citations
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
Kolby Nottingham, Bodhisattwa Prasad Majumder, Bhavana Dalvi et al.
ICML 2024arXiv:2402.03244
12
citations