"reinforcement learning from human feedback" Papers
68 papers found • Page 2 of 2
Conference
Learning Optimal Advantage from Preferences and Mistaking It for Reward
W Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson et al.
Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback
songyang gao, Qiming Ge, Wei Shen et al.
MaxMin-RLHF: Alignment with Diverse Human Preferences
Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.
Nash Learning from Human Feedback
REMI MUNOS, Michal Valko, Daniele Calandriello et al.
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen, Chen Zhu, Jiuhai Chen et al.
Position: Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer, Rachel Freedman, Jobstq Heitzig et al.
Preference Ranking Optimization for Human Alignment
Feifan Song, Bowen Yu, Minghao Li et al.
Privacy-Preserving Instructions for Aligning Large Language Models
Da Yu, Peter Kairouz, Sewoong Oh et al.
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
Li Ding, Jenny Zhang, Jeff Clune et al.
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang et al.
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Harrison Lee, Samrat Phatale, Hassan Mansoor et al.
RLVF: Learning from Verbal Feedback without Overgeneralization
Moritz Stephan, Alexander Khazatsky, Eric Mitchell et al.
Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution
Emily McMilin
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Rame, Nino Vieillard, Léonard Hussenot et al.
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Collin Burns, Pavel Izmailov, Jan Kirchner et al.
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue
Songhua Yang, Hanjie Zhao, Senbin Zhu et al.