Provable and Practical Online Learning Rate Adaptation with Hypergradient Descent

7
citations
#765
in ICML 2025
of 3340 papers
4
Top Authors
4
Data Points

Abstract

This paper investigates the convergence properties of the hypergradient descent method ($\texttt{HDM}$), a 25-year-old heuristic originally proposed for adaptive stepsize selection in stochastic first-order methods. We provide the first rigorous convergence analysis of $\texttt{HDM}$ using the online learning framework and apply this analysis to develop a new state-of-the-art adaptive gradient methods with empirical and theoretical support. Notably, $\texttt{HDM}$ automatically identifies the optimal stepsize for the local optimization landscape and achieves local superlinear convergence. Our analysis explains the instability of $\texttt{HDM}$ reported in the literature and proposes efficient strategies to address it. We also develop two $\texttt{HDM}$ variants with heavy-ball and Nesterov momentum. Experiments on deterministic convex problems show $\texttt{HDM}$ with heavy-ball momentum ($\texttt{HDM-HB}$) exhibits robust performance and significantly outperforms other adaptive first-order methods. Moreover, $\texttt{HDM-HB}$ often matches the performance of $\texttt{L-BFGS}$, an efficient and practical quasi-Newton method, using less memory and cheaper iterations.

Citation History

Jan 28, 2026
0
Feb 13, 2026
7+7
Feb 13, 2026
7
Feb 13, 2026
7