"optimizer design" Papers
7 papers found
Conference
Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
Xinghan Li, Haodong Wen, Kaifeng Lyu
NEURIPS 2025arXiv:2511.02773
1
citations
Gradient Multi-Normalization for Efficient LLM Training
Meyer Scetbon, Chao Ma, Wenbo Gong et al.
NEURIPS 2025
3
citations
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Jui-Nan Yen, Si Si, Zhao Meng et al.
ICLR 2025arXiv:2410.20625
16
citations
On the Performance Analysis of Momentum Method: A Frequency Domain Perspective
Xianliang Li, Jun Luo, Zhiwei Zheng et al.
ICLR 2025arXiv:2411.19671
4
citations
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini, Pierre Ablin, David Grangier
ICLR 2025arXiv:2409.03137
27
citations
Scaling Exponents Across Parameterizations and Optimizers
Katie Everett, Lechao Xiao, Mitchell Wortsman et al.
ICML 2024arXiv:2407.05872
51
citations
Understanding Adam Optimizer via Online Learning of Updates: Adam is FTRL in Disguise
Kwangjun Ahn, Zhiyu Zhang, Yunbum Kook et al.
ICML 2024arXiv:2402.01567
22
citations