Balancing Gradient and Hessian Queries in Non-Convex Optimization

1
citations
#2497
in NEURIPS 2025
of 5858 papers
4
Top Authors
7
Data Points

Abstract

We develop optimization methods which offer new trade-offs between the number of gradient and Hessian computations needed to compute the critical point of a non-convex function. We provide a method that for any twice-differentiable $f\colon \mathbb R^d \rightarrow \mathbb R$ with $L_2$-Lipschitz Hessian, input initial point with $Δ$-bounded sub-optimality, and sufficiently small $ε> 0$, outputs an $ε$-critical point, i.e., a point $x$ such that $\|\nabla f(x)\| \leq ε$, using $\tilde{O}(L_2^{1/4} n_H^{-1/2}Δε^{-9/4})$ queries to a gradient oracle and $n_H$ queries to a Hessian oracle for any positive integer $n_H$. As a consequence, we obtain an improved gradient query complexity of $\tilde{O}(d^{1/3}L_2^{1/2}Δε^{-3/2})$ in the case of bounded dimension and of $\tilde{O}(L_2^{3/4}Δ^{3/2}ε^{-9/4})$ in the case where we are allowed only a single Hessian query. We obtain these results through a more general algorithm which can handle approximate Hessian computations and recovers the state-of-the-art bound of computing an $ε$-critical point with $O(L_1^{1/2}L_2^{1/4}Δε^{-7/4})$ gradient queries provided that $f$ also has an $L_1$-Lipschitz gradient.

Citation History

Jan 26, 2026
0
Jan 26, 2026
0
Jan 27, 2026
1+1
Feb 3, 2026
1
Feb 13, 2026
1
Feb 13, 2026
1
Feb 13, 2026
1