"human preference alignment" Papers

39 papers found

A Gradient Guidance Perspective on Stepwise Preference Optimization for Diffusion Models

Joshua Tian Jin Tee, Hee Suk Yoon, Abu Hanif Muhammad Syarubany et al.

NEURIPS 2025oral

Aligning Text-to-Image Diffusion Models to Human Preference by Classification

Longquan Dai, Xiaolu Wei, wang he et al.

NEURIPS 2025spotlight

ALLaM: Large Language Models for Arabic and English

M Saiful Bari, Yazeed Alnumay, Norah Alzahrani et al.

ICLR 2025arXiv:2407.15390
49
citations

Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution

Wentao Tan, Qiong Cao, Yibing Zhan et al.

AAAI 2025paperarXiv:2412.15650
7
citations

Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations

Peng Lai, Jianjie Zheng, Sijie Cheng et al.

NEURIPS 2025arXiv:2508.03550
3
citations

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.

ICLR 2025arXiv:2408.15313
24
citations

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law

Shawn Im, Sharon Li

NEURIPS 2025arXiv:2408.03459
8
citations

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025arXiv:2503.15265
35
citations

Direct Alignment with Heterogeneous Preferences

Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.

NEURIPS 2025arXiv:2502.16320
10
citations

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling

Xin Xie, Dong Gong

CVPR 2025arXiv:2412.00759
16
citations

Eliciting Human Preferences with Language Models

Belinda Li, Alex Tamkin, Noah Goodman et al.

ICLR 2025oralarXiv:2310.11589
79
citations

Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions

Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.

AAAI 2025paperarXiv:2408.08781
39
citations

Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation

Xiaoying Xing, Avinab Saha, Junfeng He et al.

CVPR 2025highlightarXiv:2501.06481
4
citations

Inference-Time Reward Hacking in Large Language Models

Hadi Khalaf, Claudio Mayrink Verdun, Alex Oesterling et al.

NEURIPS 2025spotlightarXiv:2506.19248
3
citations

Less is More: Improving LLM Alignment via Preference Data Selection

Xun Deng, Han Zhong, Rui Ai et al.

NEURIPS 2025spotlightarXiv:2502.14560

LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs

Jiarui Wang, Huiyu Duan, Yu Zhao et al.

ICCV 2025highlightarXiv:2504.08358
16
citations

MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences

Genta Winata, David Anugraha, Lucky Susanto et al.

ICLR 2025arXiv:2410.02381
17
citations

Radiology Report Generation via Multi-objective Preference Optimization

Ting Xiao, Lei Shi, Peng Liu et al.

AAAI 2025paperarXiv:2412.08901
10
citations

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Hanyang Zhao, Genta Winata, Anirban Das et al.

ICLR 2025arXiv:2410.04203
19
citations

ReNeg: Learning Negative Embedding with Reward Guidance

Xiaomin Li, yixuan liu, Takashi Isobe et al.

CVPR 2025highlightarXiv:2412.19637
6
citations

Reward Guided Latent Consistency Distillation

William Wang, Jiachen Li, Weixi Feng et al.

ICLR 2025arXiv:2403.11027
27
citations

Risk-aware Direct Preference Optimization under Nested Risk Measure

Lijun Zhang, Lin Li, Yajie Qi et al.

NEURIPS 2025arXiv:2505.20359
2
citations

RRM: Robust Reward Model Training Mitigates Reward Hacking

Tianqi Liu, Wei Xiong, Jie Ren et al.

ICLR 2025arXiv:2409.13156
50
citations

Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

Liang Peng, Boxi Wu, Haoran Cheng et al.

NEURIPS 2025

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.

ICLR 2025arXiv:2410.08847
51
citations

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.

ICLR 2025arXiv:2410.18640
15
citations

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025arXiv:2502.20694
37
citations

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.

ICML 2024arXiv:2403.04132
1026
citations

Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases

Ziyi Zhang, Sen Zhang, Yibing Zhan et al.

ICML 2024oralarXiv:2402.08552
24
citations

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Lirui Zhao, Yue Yang, Kaipeng Zhang et al.

CVPR 2024arXiv:2404.01342
7
citations

DreamReward: Aligning Human Preference in Text-to-3D Generation

junliang ye, Fangfu Liu, Qixiu Li et al.

ECCV 2024

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

Zilin Wang, Haolin Zhuang, Lu Li et al.

AAAI 2024paperarXiv:2312.11442
5
citations

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du et al.

ECCV 2024arXiv:2401.12244
77
citations

MaxMin-RLHF: Alignment with Diverse Human Preferences

Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.

ICML 2024arXiv:2402.08925
88
citations

MusicRL: Aligning Music Generation to Human Preferences

Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.

ICML 2024arXiv:2301.11325
616
citations

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang, Xiaoman Pan, Feng Luo et al.

ICML 2024arXiv:2402.10207
125
citations

Self-Rewarding Language Models

Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho et al.

ICML 2024arXiv:2401.10020
497
citations

Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning

Zongmeng Zhang, Yufeng Shi, Jinhua Zhu et al.

ICML 2024arXiv:2410.16843
2
citations

Understanding the Learning Dynamics of Alignment with Human Feedback

Shawn Im, Sharon Li

ICML 2024arXiv:2403.18742
18
citations