Poster "kl divergence constraint" Papers
2 papers found
Conference
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
NEURIPS 2025arXiv:2505.12366
12
citations
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
Bo Wang, Qinyuan Cheng, Runyu Peng et al.
NEURIPS 2025arXiv:2507.00018
15
citations