"preconditioned gradient descent" Papers
3 papers found
Conference
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Yanna Ding, Songtao Lu, Yingdong Lu et al.
NEURIPS 2025arXiv:2510.18638
Optimization Inspired Few-Shot Adaptation for Large Language Models
Boyan Gao, Xin Wang, Yibo Yang et al.
NEURIPS 2025spotlightarXiv:2505.19107
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.
ICML 2024arXiv:2410.08292
38
citations