Poster "preconditioned gradient descent" Papers
2 papers found
Conference
Optimality and NP-Hardness of Transformers in Learning Markovian Dynamical Functions
Yanna Ding, Songtao Lu, Yingdong Lu et al.
NEURIPS 2025arXiv:2510.18638
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.
ICML 2024arXiv:2410.08292
38
citations