LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics

26citations

arXiv:2410.16103 Project

citations

#676

in ICLR 2025

of 3827 papers

Top Authors

Data Points

Top Authors

Thomas Robert Mher Safaryan Ionut-Vlad Modoranu Dan Alistarh

Abstract

We introduce LDAdam, a memory-efficient optimizer for training large models, that performs adaptive optimization steps within lower dimensional subspaces, while consistently exploring the full parameter space during training. This strategy keeps the optimizer's memory footprint to a fraction of the model size. LDAdam relies on a new projection-aware update rule for the optimizer states that allows for transitioning between subspaces, i.e., estimation of the statistics of the projected gradients. To mitigate the errors due to low-rank projection, LDAdam integrates a new generalized error feedback mechanism, which explicitly accounts for both gradient and optimizer state compression. We prove the convergence of LDAdam under standard assumptions, and provide empirical evidence that LDAdam allows for efficient fine-tuning and pre-training of language models.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 28, 2026

Feb 13, 2026

26+26

Feb 13, 2026