"transformer training dynamics" Papers
3 papers found
Conference
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Ryotaro Kawata, Yujin Song, Alberto Bietti et al.
NEURIPS 2025spotlightarXiv:2512.18634
1
citations
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers
Pulkit Gopalani, Wei Hu
NEURIPS 2025arXiv:2506.13688
2
citations
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li, Meng Wang, Songtao Lu et al.
ICML 2024arXiv:2402.15607
34
citations