"transformers architecture" Papers
5 papers found
Conference
A Formal Framework for Understanding Length Generalization in Transformers
Xinting Huang, Andy Yang, Satwik Bhattamishra et al.
ICLR 2025arXiv:2410.02140
29
citations
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin, Zhaoran Wang
ICLR 2025arXiv:2501.15857
2
citations
How do Transformers Learn Implicit Reasoning?
Jiaran Ye, Zijun Yao, Zhidian Huang et al.
NEURIPS 2025oralarXiv:2505.23653
11
citations
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng, Yuxin Chen, Suvrit Sra
ICML 2024arXiv:2312.06528
63
citations
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot
Zixuan Wang, Stanley Wei, Daniel Hsu et al.
ICML 2024arXiv:2406.06893
21
citations