Rethinking Associative Memory Mechanism in Induction Head

0citations

PDF Project

citations

#337

in COLM 2025

of 418 papers

Top Authors

Data Points

Top Authors

Shuo Wang Issei Sato

Topics

transformer induction head associative memory positional encoding

Abstract

Induction head mechanism is a part of the computational circuits for in-context learning (ICL) that enable large language models (LLMs) to adapt to new tasks without fine-tuning. Most existing work explains the training dynamics behind acquiring such a powerful mechanism. However, it is unclear how a transformer extract information from long contexts and then use it to coordinate with global knowledge acquired during pretraninig. This paper considers weight matrices as associative memory to investigate how an induction head functions over long contexts and balances in-context and global bigram knowledge in next token prediction. We theoretically analyze the representation of the learned associative memory in attention layers and the resulting logits when a transformer is given prompts generated by a bigram model. In the experiments, we design specific prompts to evaluate whether the outputs of the trained transformer align with the theoretical results.

Citation History

Feb 10, 2026