"delayed generalization" Papers
3 papers found
Conference
Grokking at the Edge of Numerical Stability
Lucas Prieto, Melih Barsbey, Pedro Mediano et al.
ICLR 2025arXiv:2501.04697
20
citations
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Zhiwei Xu, Zhiyu Ni, Yixin Wang et al.
ICLR 2025arXiv:2504.13292
6
citations
Deep Networks Always Grok and Here is Why
Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk
ICML 2024arXiv:2402.15555
47
citations