Geometry of Lightning Self-Attention: Identifiability and Dimension
12citations
arXiv:2408.1722112
citations
#1335
in ICLR 2025
of 3827 papers
3
Top Authors
7
Data Points
Topics
Abstract
We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.
Citation History
Jan 26, 2026
0
Jan 26, 2026
0
Jan 27, 2026
0
Feb 3, 2026
11+11
Feb 13, 2026
12+1
Feb 13, 2026
12
Feb 13, 2026
12