"preference optimization" Papers
60 papers found • Page 1 of 2
Conference
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement
Hui Yuan, Yifan Zeng, Yue Wu et al.
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Anh Tuan Luu, Yue Liu et al.
A Gradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models
Joshua Tian Jin Tee, Hee Suk Yoon, Abu Hanif Muhammad Syarubany et al.
Aligning Visual Contrastive learning models via Preference Optimization
Amirabbas Afzali, Borna khodabandeh, Ali Rasekh et al.
As Simple as Fine-tuning: LLM Alignment via Bidirectional Negative Feedback Loss
Xin Mao, Huimin Xu, Feng-Lin Li et al.
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.
Avoiding exp(R) scaling in RLHF through Preference-based Exploration
Mingyu Chen, Yiding Chen, Wen Sun et al.
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.
Calibrated Multi-Preference Optimization for Aligning Diffusion Models
Kyungmin Lee, Xiaohang Li, Qifei Wang et al.
CPO: Condition Preference Optimization for Controllable Image Generation
Zonglin Lyu, Ming Li, Xinxin Liu et al.
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Di Zhang, Jingdi Lei, Junxian Li et al.
Data Distillation for extrapolative protein design through exact preference optimization
Mostafa Karimi, Sharmi Banerjee, Tommi Jaakkola et al.
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Tao Zhang, Cheng Da, Kun Ding et al.
Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou et al.
Factorized Learning for Temporally Grounded Video-Language Models
Wenzheng Zeng, Difei Gao, Mike Zheng Shou et al.
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
Jiayi Guo, Chuanhao Yan, Xingqian Xu et al.
KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models
Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.
Learning from negative feedback, or positive feedback or both
Abbas Abdolmaleki, Bilal Piot, Bobak Shahriari et al.
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
Guanzheng Chen, Xin Li, Michael Qizhe Shieh et al.
LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization
Zhenpeng Huang, Jiaqi Li, zihan jia et al.
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai, Jiajie Zhang, Xin Lv et al.
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu, Lingyong Yan, Zihan Wang et al.
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing
Zhangchen Xu, Fengqing Jiang, Luyao Niu et al.
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
Meta-Learning Objectives for Preference Optimization
Carlo Alfano, Silvia Sapora, Jakob Foerster et al.
Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
Nguyen Phuc, Ngoc-Hieu Nguyen, Duy M. H. Nguyen et al.
nvBench 2.0: Resolving Ambiguity in Text-to-Visualization through Stepwise Reasoning
Tianqi Luo, Chuhan Huang, Leixian Shen et al.
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
Yixuan Yang, Zhen Luo, Tongsheng Ding et al.
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye, Xianhan Zeng, Fu Li et al.
Predictive Preference Learning from Human Interventions
Haoyuan Cai, Zhenghao (Mark) Peng, Bolei Zhou
Preference Distillation via Value based Reinforcement Learning
Minchan Kwon, Junwon Ko, Kangil kim et al.
Preference Optimization by Estimating the Ratio of the Data Distribution
Yeongmin Kim, HeeSun Bae, Byeonghu Na et al.
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.
Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization
Abhishek Roy, Geelon So, Yian Ma
Radiology Report Generation via Multi-objective Preference Optimization
Ting Xiao, Lei Shi, Peng Liu et al.
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Winata, Anirban Das et al.
Ranking-based Preference Optimization for Diffusion Models from Implicit User Feedback
Yi-Lun Wu, Bo-Kai Ruan, Chiang Tseng et al.
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar, Ali Etemad
Self-Evolutionary Large Language Models Through Uncertainty-Enhanced Preference Optimization
Jianing Wang, Yang Zhou, Xiaocheng Zhang et al.
SeRA: Self-Reviewing and Alignment of LLMs using Implicit Reward Margins
Jongwoo Ko, Saket Dingliwal, Bhavana Ganesh et al.
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao, Yige Yuan, Zhengyu Chen et al.
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
Ling Yang, Zhaochen Yu, Tianjun Zhang et al.
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo, Lu Yin, Bo Jiang et al.
Token-Level Self-Play with Importance-Aware Guidance for Large Language Models
Tue Le, Hoang Tran, Quyen Tran et al.
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Zichen Miao, Zhengyuan Yang, Kevin Lin et al.
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only
Jihan Yao, Wenxuan Ding, Shangbin Feng et al.
Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang, Jie Lu, En Yu