"gradient descent dynamics" Papers

10 papers found

A Minimalist Example of Edge-of-Stability and Progressive Sharpening

Liming Liu, Zixuan Zhang, Simon Du et al.

NEURIPS 2025arXiv:2503.02809
1
citations

From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning

Junsoo Oh, Jerry Song, Chulhee Yun

NEURIPS 2025arXiv:2510.24812
2
citations

How Two-Layer Neural Networks Learn, One (Giant) Step at a Time

Yatin Dandi, Florent Krzakala, Bruno Loureiro et al.

ICLR 2025arXiv:2305.18270
52
citations

Loss Landscape of Shallow ReLU-like Neural Networks: Stationary Points, Saddle Escape, and Network Embedding

Frank Zhengqing Wu, Berfin Simsek, François Ged

ICLR 2025arXiv:2402.05626
2
citations

Quantitative convergence of trained neural networks to Gaussian processes

Andrea Agazzi, Eloy Mosig García, Dario Trevisan

NEURIPS 2025

The Computational Advantage of Depth in Learning High-Dimensional Hierarchical Targets

Yatin Dandi, Luca Pesce, Lenka Zdeborová et al.

NEURIPS 2025spotlight

A Dynamical Model of Neural Scaling Laws

Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

ICML 2024arXiv:2402.01092
77
citations

A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks

Behrad Moniri, Donghwan Lee, Hamed Hassani et al.

ICML 2024arXiv:2310.07891
35
citations

The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents

Yatin Dandi, Emanuele Troiani, Luca Arnaboldi et al.

ICML 2024arXiv:2402.03220
39
citations

Why Do You Grok? A Theoretical Analysis on Grokking Modular Addition

Mohamad Amin Mohamadi, Zhiyuan Li, Lei Wu et al.

ICML 2024arXiv:2407.12332
19
citations