"human preference learning" Papers
5 papers found
Conference
Interpreting Language Reward Models via Contrastive Explanations
Junqi Jiang, Tom Bewley, Saumitra Mishra et al.
ICLR 2025arXiv:2411.16502
5
citations
Uncertainty and Influence aware Reward Model Refinement for Reinforcement Learning from Human Feedback
Zexu Sun, Yiju Guo, Yankai Lin et al.
ICLR 2025
5
citations
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
ICLR 2025arXiv:2409.17401
10
citations
Learning Multi-Dimensional Human Preference for Text-to-Image Generation
Sixian Zhang, Bohan Wang, Junqiang Wu et al.
CVPR 2024arXiv:2405.14705
84
citations
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.
ICML 2024arXiv:2403.01857
20
citations