"grokking phenomenon" Papers
7 papers found
Conference
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Ting Han, Linara Adilova, Henning Petzka et al.
NEURIPS 2025oralarXiv:2509.17738
3
citations
Grokking at the Edge of Numerical Stability
Lucas Prieto, Melih Barsbey, Pedro Mediano et al.
ICLR 2025arXiv:2501.04697
20
citations
Less is More: Local Intrinsic Dimensions of Contextual Language Models
Benjamin Matthias Ruppik, Julius von Rohrscheidt, Carel van Niekerk et al.
NEURIPS 2025arXiv:2506.01034
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Zhiwei Xu, Zhiyu Ni, Yixin Wang et al.
ICLR 2025arXiv:2504.13292
6
citations
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Bhavya Vasudeva, Deqing Fu, Tianyi Zhou et al.
ICLR 2025arXiv:2403.06925
8
citations
Unveiling the Dynamics of Information Interplay in Supervised Learning
Kun Song, Zhiquan Tan, Bochao Zou et al.
ICML 2024arXiv:2406.03999
3
citations
Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition
Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu et al.
ICML 2024arXiv:2407.12332
19
citations