"alignment algorithms" Papers
3 papers found
Conference
Diverse Preference Learning for Capabilities and Alignment
Stewart Slocum, Asher Parker-Sartori, Dylan Hadfield-Menell
ICLR 2025arXiv:2511.08594
24
citations
On a Connection Between Imitation Learning and RLHF
Teng Xiao, Yige Yuan, Mingxiao Li et al.
ICLR 2025arXiv:2503.05079
14
citations
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee, Xiaoyan Bai, Itamar Pres et al.
ICML 2024arXiv:2401.01967
165
citations