Poster "weight decay" Papers
3 papers found
Conference
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
NEURIPS 2025arXiv:2505.13738
17
citations
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
Xuan Tang, Han Zhang, Yuan Cao et al.
NEURIPS 2025arXiv:2510.11354
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Arthur Jacot, Peter Súkeník, Zihan Wang et al.
ICLR 2025arXiv:2410.04887
10
citations