Poster "alignment performance" Papers
3 papers found
Conference
Information-Theoretic Reward Decomposition for Generalizable RLHF
Liyuan Mao, Haoran Xu, Amy Zhang et al.
NEURIPS 2025arXiv:2504.06020
3
citations
Learning Dynamics of LLM Finetuning
YI REN, Danica Sutherland
ICLR 2025arXiv:2407.10490
67
citations
Perfect Alignment May be Poisonous to Graph Contrastive Learning
Jingyu Liu, Huayi Tang, Yong Liu
ICML 2024arXiv:2310.03977
4
citations