Poster "reverse-kl regularization" Papers
2 papers found
Conference
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao, Chenlu Ye, Quanquan Gu et al.
NEURIPS 2025arXiv:2411.04625
16
citations
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
Wei Xiong, Hanze Dong, Chenlu Ye et al.
ICML 2024arXiv:2312.11456
312
citations