Paper "reinforcement learning from human feedback" Papers
6 papers found
Conference
Extragradient Preference Optimization (EGPO): Beyond Last-Iterate Convergence for Nash Learning from Human Feedback
Runlong Zhou, Maryam Fazel, Simon Shaolei Du
COLM 2025paperarXiv:2503.08942
13
citations
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
Johannes Ackermann, Takashi Ishida, Masashi Sugiyama
COLM 2025paperarXiv:2507.15507
Learning Optimal Advantage from Preferences and Mistaking It for Reward
W Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson et al.
AAAI 2024paperarXiv:2310.02456
16
citations
Preference Ranking Optimization for Human Alignment
Feifan Song, Bowen Yu, Minghao Li et al.
AAAI 2024paperarXiv:2306.17492
337
citations
Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution
Emily McMilin
AAAI 2024paperarXiv:2210.00131
Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-World Multi-Turn Dialogue
Songhua Yang, Hanjie Zhao, Senbin Zhu et al.
AAAI 2024paperarXiv:2308.03549
210
citations