"deterministic mdps" Papers
2 papers found
Conference
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
NEURIPS 2025arXiv:2502.20548
12
citations
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Fan Chen, Zeyu Jia, Alexander Rakhlin et al.
NEURIPS 2025arXiv:2505.20268
4
citations