"human preference optimization" Papers
2 papers found
Conference
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
Ryan Liu, Theodore R Sumers, Ishita Dasgupta et al.
ICML 2024arXiv:2402.07282
28
citations
Towards Efficient Exact Optimization of Language Model Alignment
Haozhe Ji, Cheng Lu, Yilin Niu et al.
ICML 2024arXiv:2402.00856
32
citations