by Piotr Stanczyk Papers
2 papers found
Conference
BOND: Aligning LLMs with Best-of-N Distillation
Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot-Desenonges et al.
ICLR 2025arXiv:2407.14622
53
citations
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
Rishabh Agarwal, Nino Vieillard, Yongchao Zhou et al.
ICLR 2024arXiv:2306.13649
218
citations