Poster "adamw optimization" Papers
3 papers found
Conference
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
ICLR 2025arXiv:2502.15938
24
citations
Implicit Bias of AdamW: $\ell_\infty$-Norm Constrained Optimization
Shuo Xie, Zhiyuan Li
ICML 2024
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson, Bettina Messmer, Martin Jaggi
ICML 2024arXiv:2305.17212
33
citations