"induction heads" Papers
6 papers found
Conference
From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers
Ryotaro Kawata, Yujin Song, Alberto Bietti et al.
NEURIPS 2025spotlightarXiv:2512.18634
1
citations
Selective induction Heads: How Transformers Select Causal Structures in Context
Francesco D'Angelo, francesco croce, Nicolas Flammarion
ICLR 2025arXiv:2509.08184
6
citations
The Dual-Route Model of Induction
Sheridan Feucht, Eric Todd, Byron C Wallace et al.
COLM 2025paperarXiv:2504.03022
15
citations
What One Cannot, Two Can: Two-Layer Transformers Provably Represent Induction Heads on Any-Order Markov Chains
Chanakya Ekbote, Ashok Vardhan Makkuva, Marco Bondaschi et al.
NEURIPS 2025spotlightarXiv:2508.07208
1
citations
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere et al.
ICML 2024arXiv:2404.19737
232
citations
How Transformers Learn Causal Structure with Gradient Descent
Eshaan Nichani, Alex Damian, Jason Lee
ICML 2024arXiv:2402.14735
102
citations