Poster "human preference alignment" Papers

26 papers found

ALLaM: Large Language Models for Arabic and English

M Saiful Bari, Yazeed Alnumay, Norah Alzahrani et al.

ICLR 2025arXiv:2407.15390
49
citations

Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations

Peng Lai, Jianjie Zheng, Sijie Cheng et al.

NEURIPS 2025arXiv:2508.03550
3
citations

Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models

Wenxuan Zhang, Philip Torr, Mohamed Elhoseiny et al.

ICLR 2025arXiv:2408.15313
24
citations

Can DPO Learn Diverse Human Values? A Theoretical Scaling Law

Shawn Im, Sharon Li

NEURIPS 2025arXiv:2408.03459
8
citations

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.

ICCV 2025arXiv:2503.15265
35
citations

Direct Alignment with Heterogeneous Preferences

Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.

NEURIPS 2025arXiv:2502.16320
10
citations

DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling

Xin Xie, Dong Gong

CVPR 2025arXiv:2412.00759
16
citations

MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences

Genta Winata, David Anugraha, Lucky Susanto et al.

ICLR 2025arXiv:2410.02381
17
citations

RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization

Hanyang Zhao, Genta Winata, Anirban Das et al.

ICLR 2025arXiv:2410.04203
19
citations

Reward Guided Latent Consistency Distillation

William Wang, Jiachen Li, Weixi Feng et al.

ICLR 2025arXiv:2403.11027
27
citations

Risk-aware Direct Preference Optimization under Nested Risk Measure

Lijun Zhang, Lin Li, Yajie Qi et al.

NEURIPS 2025arXiv:2505.20359
2
citations

RRM: Robust Reward Model Training Mitigates Reward Hacking

Tianqi Liu, Wei Xiong, Jie Ren et al.

ICLR 2025arXiv:2409.13156
50
citations

Self-Supervised Direct Preference Optimization for Text-to-Image Diffusion Models

Liang Peng, Boxi Wu, Haoran Cheng et al.

NEURIPS 2025

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Noam Razin, Sadhika Malladi, Adithya Bhaskar et al.

ICLR 2025arXiv:2410.08847
51
citations

Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model

Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.

ICLR 2025arXiv:2410.18640
15
citations

WorldModelBench: Judging Video Generation Models As World Models

Dacheng Li, Yunhao Fang, Yukang Chen et al.

NEURIPS 2025arXiv:2502.20694
37
citations

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.

ICML 2024arXiv:2403.04132
1026
citations

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Lirui Zhao, Yue Yang, Kaipeng Zhang et al.

CVPR 2024arXiv:2404.01342
7
citations

DreamReward: Aligning Human Preference in Text-to-3D Generation

junliang ye, Fangfu Liu, Qixiu Li et al.

ECCV 2024

Large-scale Reinforcement Learning for Diffusion Models

Yinan Zhang, Eric Tzeng, Yilun Du et al.

ECCV 2024arXiv:2401.12244
77
citations

MaxMin-RLHF: Alignment with Diverse Human Preferences

Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.

ICML 2024arXiv:2402.08925
88
citations

MusicRL: Aligning Music Generation to Human Preferences

Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.

ICML 2024arXiv:2301.11325
616
citations

Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment

Rui Yang, Xiaoman Pan, Feng Luo et al.

ICML 2024arXiv:2402.10207
125
citations

Self-Rewarding Language Models

Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho et al.

ICML 2024arXiv:2401.10020
497
citations

Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning

Zongmeng Zhang, Yufeng Shi, Jinhua Zhu et al.

ICML 2024arXiv:2410.16843
2
citations

Understanding the Learning Dynamics of Alignment with Human Feedback

Shawn Im, Sharon Li

ICML 2024arXiv:2403.18742
18
citations