Spotlight "human preference alignment" Papers
3 papers found
Conference
Aligning Text-to-Image Diffusion Models to Human Preference by Classification
Longquan Dai, Xiaolu Wei, wang he et al.
NEURIPS 2025spotlight
Inference-Time Reward Hacking in Large Language Models
Hadi Khalaf, Claudio Mayrink Verdun, Alex Oesterling et al.
NEURIPS 2025spotlightarXiv:2506.19248
3
citations
Less is More: Improving LLM Alignment via Preference Data Selection
Xun Deng, Han Zhong, Rui Ai et al.
NEURIPS 2025spotlightarXiv:2502.14560