"human feedback" Papers
6 papers found
Conference
InterMT: Multi-Turn Interleaved Preference Alignment with Human Feedback
Boyuan Chen, Donghai Hong, Jiaming Ji et al.
NEURIPS 2025spotlightarXiv:2505.23950
1
citations
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
ICLR 2025arXiv:2405.14953
15
citations
Mitigating Reward Over-optimization in Direct Alignment Algorithms with Importance Sampling
Nguyen Phuc, Ngoc-Hieu Nguyen, Duy M. H. Nguyen et al.
NEURIPS 2025arXiv:2506.08681
A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback
Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar et al.
ICML 2024arXiv:2405.12421
2
citations
Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.
ICML 2024arXiv:2402.00396
37
citations
Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff et al.
ICML 2024spotlightarXiv:2402.01306
871
citations