"adaptive gradient methods" Papers
5 papers found
Conference
AdaGrad under Anisotropic Smoothness
Yuxing Liu, Rui Pan, Tong Zhang
ICLR 2025arXiv:2406.15244
14
citations
Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
Xinghan Li, Haodong Wen, Kaifeng Lyu
NEURIPS 2025arXiv:2511.02773
1
citations
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
Xuan Tang, Han Zhang, Yuan Cao et al.
NEURIPS 2025arXiv:2510.11354
Can We Remove the Square-Root in Adaptive Gradient Methods? A Second-Order Perspective
Wu Lin, Felix Dangel, Runa Eschenhagen et al.
ICML 2024arXiv:2402.03496
19
citations
Faster Adaptive Decentralized Learning Algorithms
Feihu Huang, jianyu zhao
ICML 2024spotlightarXiv:2408.09775
3
citations