Oral "training dynamics" Papers
3 papers found
Conference
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Ting Han, Linara Adilova, Henning Petzka et al.
NEURIPS 2025oralarXiv:2509.17738
3
citations
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.
NEURIPS 2025oralarXiv:2505.17863
7
citations
Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training
Tony Bonnaire, Raphaël Urfin, Giulio Biroli et al.
NEURIPS 2025oral