"transformer optimization" Papers
3 papers found
Conference
GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation
Zhengqiang ZHANG, Rongyuan Wu, Lingchen Sun et al.
NEURIPS 2025arXiv:2509.01109
2
citations
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Bingrui Li, Wei Huang, Andi Han et al.
ICLR 2025arXiv:2410.04870
9
citations
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
David T. Hoffmann, Simon Schrodi, Jelena Bratulić et al.
ICML 2024arXiv:2310.12956
11
citations