"gradient descent dynamics" Papers
10 papers found
Conference
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu, Zixuan Zhang, Simon Du et al.
NEURIPS 2025arXiv:2503.02809
1
citations
From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
Junsoo Oh, Jerry Song, Chulhee Yun
NEURIPS 2025arXiv:2510.24812
2
citations
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi, Florent Krzakala, Bruno Loureiro et al.
ICLR 2025arXiv:2305.18270
52
citations
Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escape, and Network Embedding
Frank Zhengqing Wu, Berfin Simsek, François Ged
ICLR 2025arXiv:2402.05626
2
citations
Quantitative convergence of trained neural networks to Gaussian processes
Andrea Agazzi, Eloy Mosig García, Dario Trevisan
NEURIPS 2025
The Computational Advantage of Depth in Learning High-Dimensional Hierarchical Targets
Yatin Dandi, Luca Pesce, Lenka Zdeborová et al.
NEURIPS 2025spotlight
A Dynamical Model of Neural Scaling Laws
Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
ICML 2024arXiv:2402.01092
77
citations
A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
Behrad Moniri, Donghwan Lee, Hamed Hassani et al.
ICML 2024arXiv:2310.07891
35
citations
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents
Yatin Dandi, Emanuele Troiani, Luca Arnaboldi et al.
ICML 2024arXiv:2402.03220
39
citations
Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition
Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu et al.
ICML 2024arXiv:2407.12332
19
citations