Emergence of a High-Dimensional Abstraction Phase in Language Transformers

34citations
arXiv:2405.15471
34
citations
#512
in ICLR 2025
of 3827 papers
7
Top Authors
7
Data Points

Abstract

A language model (LM) is a mapping from a linguistic context to an output token. However, much remains to be known about this mapping, including how its geometric properties relate to its function. We take a high-level geometric approach to its analysis, observing, across five pre-trained transformer-based LMs and three input datasets, a distinct phase characterized by high intrinsic dimensionality. During this phase, representations (1) correspond to the first full linguistic abstraction of the input; (2) are the first to viably transfer to downstream tasks; (3) predict each other across different LMs. Moreover, we find that an earlier onset of the phase strongly predicts better language modelling performance. In short, our results suggest that a central high-dimensionality phase underlies core linguistic processing in many common LM architectures.

Citation History

Jan 26, 2026
32
Jan 26, 2026
32
Jan 27, 2026
32
Feb 3, 2026
32
Feb 13, 2026
34+2
Feb 13, 2026
34
Feb 13, 2026
34