Poster "gradient descent" Papers
21 papers found
Conference
Adaptive backtracking for faster optimization
Joao V. Cavalcanti, Laurent Lessard, Ashia Wilson
Complexity Scaling Laws for Neural Models using Combinatorial Optimization
Lowell Weissman, Michael Krumdick, A. Abbott
Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification
Hyunji Jung, Hanseul Cho, Chulhee Yun
Convergence Rates for Gradient Descent on the Edge of Stability for Overparametrised Least Squares
Lachlan MacDonald, Hancheng Min, Leandro Palma et al.
Global Convergence in Neural ODEs: Impact of Activation Functions
Tianxiang Gao, Siyuan Sun, Hailiang Liu et al.
Learning High-Degree Parities: The Crucial Role of the Initialization
Emmanuel Abbe, Elisabetta Cornacchia, Jan Hązła et al.
MAP Estimation with Denoisers: Convergence Rates and Guarantees
Scott Pesme, Giacomo Meanti, Michael Arbel et al.
New Perspectives on the Polyak Stepsize: Surrogate Functions and Negative Results
Francesco Orabona, Ryan D'Orazio
Simple and Optimal Sublinear Algorithms for Mean Estimation
Beatrice Bertolotti, Matteo Russo, Chris Schwiegelshohn et al.
Transformer Learns Optimal Variable Selection in Group-Sparse Classification
Chenyang Zhang, Xuran Meng, Yuan Cao
Transformers are almost optimal metalearners for linear classification
Roey Magen, Gal Vardi
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.
Convergence and Trade-Offs in Riemannian Gradient Descent and Riemannian Proximal Point
David Martínez-Rubio, Christophe Roux, Sebastian Pokutta
Differentiability and Optimization of Multiparameter Persistent Homology
Luis Scoccola, Siddharth Setlur, David Loiseaux et al.
Interpreting and Improving Diffusion Models from an Optimization Perspective
Frank Permenter, Chenyang Yuan
Learning Associative Memories with Gradient Descent
Vivien Cabannnes, Berfin Simsek, Alberto Bietti
Non-stationary Online Convex Optimization with Arbitrary Delays
Yuanyu Wan, Chang Yao, Mingli Song et al.
Position: Do pretrained Transformers Learn In-Context by Gradient Descent?
Lingfeng Shen, Aayush Mishra, Daniel Khashabi
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng, Yuxin Chen, Suvrit Sra
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
Zixuan Wang, Stanley Wei, Daniel Hsu et al.