"human feedback alignment" Papers
11 papers found
Conference
Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models
Yongjin Yang, Sihyeon Kim, Hojung Jung et al.
ICLR 2025arXiv:2410.10166
3
citations
Curriculum Direct Preference Optimization for Diffusion and Consistency Models
Florinel Croitoru, Vlad Hondru, Radu Tudor Ionescu et al.
CVPR 2025arXiv:2405.13637
24
citations
Earlier Tokens Contribute More: Learning Direct Preference Optimization From Temporal Decay Perspective
Ruichen Shao, Bei Li, Gangao Liu et al.
ICLR 2025oralarXiv:2502.14340
7
citations
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
NEURIPS 2025arXiv:2505.18531
7
citations
Reward Learning from Multiple Feedback Types
Yannick Metz, Andras Geiszl, Raphaël Baur et al.
ICLR 2025arXiv:2502.21038
5
citations
Scalable Valuation of Human Feedback through Provably Robust Model Alignment
Masahiro Fujisawa, Masaki Adachi, Michael A Osborne
NEURIPS 2025arXiv:2505.17859
1
citations
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Ángela López-Cardona, Carlos Segura, Alexandros Karatzoglou et al.
ICLR 2025arXiv:2410.01532
8
citations
Improving Prototypical Visual Explanations with Reward Reweighing, Reselection, and Retraining
Aaron Li, Robin Netzorg, Zhihan Cheng et al.
ICML 2024arXiv:2307.03887
4
citations
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu, Michael Jordan, Jiantao Jiao
ICML 2024arXiv:2401.16335
48
citations
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
Sanghyun Kim, Seohyeon Jung, Balhae Kim et al.
ECCV 2024arXiv:2407.21032
10
citations
ULTRAFEEDBACK: Boosting Language Models with Scaled AI Feedback
Ganqu Cui, Lifan Yuan, Ning Ding et al.
ICML 2024arXiv:2310.01377
214
citations