Poster "adaptive optimizers" Papers
4 papers found
Conference
From Attention to Activation: Unraveling the Enigmas of Large Language Models
Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.
ICLR 2025arXiv:2410.17174
8
citations
Understanding Optimization in Deep Learning with Central Flows
Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.
ICLR 2025arXiv:2410.24206
22
citations
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
ICLR 2025arXiv:2410.10986
10
citations
MADA: Meta-Adaptive Optimizers Through Hyper-Gradient Descent
Kaan Ozkara, Can Karakus, Parameswaran Raman et al.
ICML 2024arXiv:2401.08893
6
citations