Poster "policy alignment" Papers
3 papers found
Conference
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Yi Zeng, Yu Yang, Andy Zhou et al.
ICLR 2025
15
citations
Direct Alignment with Heterogeneous Preferences
Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.
NEURIPS 2025arXiv:2502.16320
10
citations
Pairwise Calibrated Rewards for Pluralistic Alignment
Daniel Halpern, Evi Micha, Ariel Procaccia et al.
NEURIPS 2025arXiv:2506.06298
2
citations