"critical batch size" Papers
3 papers found
Conference
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
NEURIPS 2025spotlightarXiv:2505.23971
6
citations
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.
ICLR 2025arXiv:2410.21676
43
citations
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
NEURIPS 2025arXiv:2505.13738
17
citations