"bradley-terry model" Papers
9 papers found
Conference
HelpSteer2-Preference: Complementing Ratings with Preferences
Zhilin Wang, Alexander Bukharin, Olivier Delalleau et al.
ICLR 2025arXiv:2410.01257
112
citations
Multi-Objective Hyperparameter Selection via Hypothesis Testing on Reliability Graphs
Amirmohammad Farzaneh, Osvaldo Simeone
NEURIPS 2025arXiv:2501.13018
1
citations
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
NEURIPS 2025arXiv:2409.17431
7
citations
Preference-Based Dynamic Ranking Structure Recognition
Nan Lu, Jian Shi, Xinyu Tian
NEURIPS 2025oralarXiv:2509.24493
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Hao Sun, Yunyi Shen, Jean-Francois Ton
ICLR 2025
27
citations
TODO: Enhancing LLM Alignment with Ternary Preferences
Yuxiang Guo, Lu Yin, Bo Jiang et al.
ICLR 2025arXiv:2411.02442
5
citations
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
ICLR 2025arXiv:2409.17401
10
citations
Token-level Direct Preference Optimization
Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.
ICML 2024arXiv:2404.11999
120
citations
Transforming and Combining Rewards for Aligning Large Language Models
Zihao Wang, Chirag Nagpal, Jonathan Berant et al.
ICML 2024arXiv:2402.00742
26
citations