Poster "reward model learning" Papers
6 papers found
Conference
DreamReward: Aligning Human Preference in Text-to-3D Generation
junliang ye, Fangfu Liu, Qixiu Li et al.
ECCV 2024
Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
Aaron Li, Robin Netzorg, Zhihan Cheng et al.
ICML 2024arXiv:2307.03887
4
citations
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
ICML 2024arXiv:2404.10719
253
citations
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu, Michael Jordan, Jiantao Jiao
ICML 2024arXiv:2401.16335
48
citations
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
Utsav Singh, Wesley A. Suttle, Brian Sadler et al.
ICML 2024arXiv:2404.13423
5
citations
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.
ICML 2024arXiv:2403.01857
20
citations