"batch size effects" Papers
4 papers found
Conference
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi, Florent Krzakala, Bruno Loureiro et al.
ICLR 2025arXiv:2305.18270
52
citations
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
Xuan Tang, Han Zhang, Yuan Cao et al.
NEURIPS 2025arXiv:2510.11354
Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning
Nikhil Vyas, Depen Morwani, Rosie Zhao et al.
ICML 2024spotlightarXiv:2306.08590
7
citations
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan et al.
ICML 2024arXiv:2306.04815
25
citations