"gradient descent" Papers

27 papers found

Adaptive backtracking for faster optimization

Joao V. Cavalcanti, Laurent Lessard, Ashia Wilson

ICLR 2025
3
citations

Complexity Scaling Laws for Neural Models using Combinatorial Optimization

Lowell Weissman, Michael Krumdick, A. Abbott

NEURIPS 2025arXiv:2506.12932

Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification

Hyunji Jung, Hanseul Cho, Chulhee Yun

ICLR 2025arXiv:2504.12712
4
citations

Convergence Rates for Gradient Descent on the Edge of Stability for Overparametrised Least Squares

Lachlan MacDonald, Hancheng Min, Leandro Palma et al.

NEURIPS 2025arXiv:2510.17506

From Logistic Regression to the Perceptron Algorithm: Exploring Gradient Descent with Large Step Sizes

Alexander Tyurin

AAAI 2025paperarXiv:2412.08424
2
citations

Global Convergence in Neural ODEs: Impact of Activation Functions

Tianxiang Gao, Siyuan Sun, Hailiang Liu et al.

ICLR 2025arXiv:2509.22436
3
citations

Hamiltonian Descent Algorithms for Optimization: Accelerated Rates via Randomized Integration Time

Qiang Fu, Andre Wibisono

NEURIPS 2025spotlightarXiv:2505.12553
2
citations

Learning Complexity of Gradient Descent and Conjugate Gradient Algorithms

Xianqi Jiao, Jia Liu, Zhiping Chen

AAAI 2025paperarXiv:2412.13473
2
citations

Learning High-Degree Parities: The Crucial Role of the Initialization

Emmanuel Abbe, Elisabetta Cornacchia, Jan Hązła et al.

ICLR 2025arXiv:2412.04910
5
citations

MAP Estimation with Denoisers: Convergence Rates and Guarantees

Scott Pesme, Giacomo Meanti, Michael Arbel et al.

NEURIPS 2025arXiv:2507.15397
2
citations

New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results

Francesco Orabona, Ryan D'Orazio

NEURIPS 2025arXiv:2505.20219
6
citations

Simple and Optimal Sublinear Algorithms for Mean Estimation

Beatrice Bertolotti, Matteo Russo, Chris Schwiegelshohn et al.

NEURIPS 2025arXiv:2406.05254

The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels

Yonatan Slutzky, ‪Yotam Alexander‬‏, Noam Razin et al.

NEURIPS 2025spotlightarXiv:2410.10473
2
citations

Transformer Learns Optimal Variable Selection in Group-Sparse Classification

Chenyang Zhang, Xuran Meng, Yuan Cao

ICLR 2025arXiv:2504.08638
4
citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NEURIPS 2025arXiv:2510.19797
1
citations

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025arXiv:2502.21212
22
citations

Asymptotics of feature learning in two-layer networks after one gradient-step

Hugo Cui, Luca Pesce, Yatin Dandi et al.

ICML 2024spotlightarXiv:2402.04980
26
citations

Beyond Implicit Bias: The Insignificance of SGD Noise in Online Learning

Nikhil Vyas, Depen Morwani, Rosie Zhao et al.

ICML 2024spotlightarXiv:2306.08590
7
citations

Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?

Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.

ICML 2024arXiv:2410.08292
38
citations

Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point

David Martínez-Rubio, Christophe Roux, Sebastian Pokutta

ICML 2024arXiv:2403.10429
3
citations

Differentiability and Optimization of Multiparameter Persistent Homology

Luis Scoccola, Siddharth Setlur, David Loiseaux et al.

ICML 2024arXiv:2406.07224
11
citations

Interpreting and Improving Diffusion Models from an Optimization Perspective

Frank Permenter, Chenyang Yuan

ICML 2024arXiv:2306.04848
14
citations

Learning Associative Memories with Gradient Descent

Vivien Cabannnes, Berfin Simsek, Alberto Bietti

ICML 2024

Non-stationary Online Convex Optimization with Arbitrary Delays

Yuanyu Wan, Chang Yao, Mingli Song et al.

ICML 2024

Position: Do pretrained Transformers Learn In-Context by Gradient Descent?

Lingfeng Shen, Aayush Mishra, Daniel Khashabi

ICML 2024

Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context

Xiang Cheng, Yuxin Chen, Suvrit Sra

ICML 2024arXiv:2312.06528
63
citations

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

Zixuan Wang, Stanley Wei, Daniel Hsu et al.

ICML 2024arXiv:2406.06893
21
citations