"kl-regularized rl" Papers
3 papers found
Conference
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
NEURIPS 2025arXiv:2502.20548
12
citations
Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions
Simon Matrenok, Skander Moalla, Caglar Gulcehre
NEURIPS 2025arXiv:2507.08068
1
citations
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao, Chenlu Ye, Quanquan Gu et al.
NEURIPS 2025arXiv:2411.04625
16
citations