Geometry of Lightning Self-Attention: Identifiability and Dimension

12citations

arXiv:2408.17221

citations

#1335

in ICLR 2025

of 3827 papers

Top Authors

Data Points

Top Authors

Nathan Henry Giovanni Luca Marchetti Kathlén Kohn

Topics

self-attention networks algebraic geometry identifiability analysis function space dimension parameterization fibers single-layer model normalized self-attention

Abstract

We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

Citation History

Jan 26, 2026

Jan 27, 2026

Feb 3, 2026

11+11

Feb 13, 2026

12+1

Feb 13, 2026