"large batch training" Papers
3 papers found
Conference
AdaGrad under Anisotropic Smoothness
Yuxing Liu, Rui Pan, Tong Zhang
ICLR 2025arXiv:2406.15244
14
citations
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
Minhak Song, Beomhan Baek, Kwangjun Ahn et al.
NEURIPS 2025arXiv:2507.09846
2
citations
Understanding outer learning rates in Local SGD
Ahmed Khaled, Satyen Kale, Arthur Douillard et al.
NEURIPS 2025