"direct preference optimization" Papers
63 papers found • Page 2 of 2
Conference
VideoDPO: Omni-Preference Alignment for Video Diffusion Generation
Runtao Liu, Haoyu Wu, Zheng Ziqiang et al.
CVPR 2025arXiv:2412.14167
75
citations
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
ICLR 2025arXiv:2409.17401
10
citations
Active Preference Learning for Large Language Models
William Muldrew, Peter Hayes, Mingtian Zhang et al.
ICML 2024arXiv:2402.08114
46
citations
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee, Xiaoyan Bai, Itamar Pres et al.
ICML 2024arXiv:2401.01967
165
citations
BRAIn: Bayesian Reward-conditioned Amortized Inference for natural language generation from feedback
Gaurav Pandey, Yatin Nandwani, Tahira Naseem et al.
ICML 2024arXiv:2402.02479
5
citations
Detecting and Preventing Hallucinations in Large Vision Language Models
Anisha Gunjal, Jihan Yin, Erhan Bas
AAAI 2024paperarXiv:2308.06394
264
citations
GRATH: Gradual Self-Truthifying for Large Language Models
Weixin Chen, Dawn Song, Bo Li
ICML 2024arXiv:2401.12292
7
citations
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao et al.
ICML 2024arXiv:2404.10719
253
citations
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
Wei Xiong, Hanze Dong, Chenlu Ye et al.
ICML 2024arXiv:2312.11456
312
citations
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
ICML 2024arXiv:2403.00409
103
citations
Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences
Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban et al.
ICML 2024arXiv:2403.01857
20
citations
Token-level Direct Preference Optimization
Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.
ICML 2024arXiv:2404.11999
120
citations
Towards Efficient Exact Optimization of Language Model Alignment
Haozhe Ji, Cheng Lu, Yilin Niu et al.
ICML 2024arXiv:2402.00856
32
citations