Poster "preference-based feedback" Papers
2 papers found
Conference
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Fan Chen, Zeyu Jia, Alexander Rakhlin et al.
NEURIPS 2025arXiv:2505.20268
4
citations
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
ICML 2024arXiv:2403.00409
103
citations