Poster "feed-forward layers" Papers
2 papers found
Conference
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers
Lei Chen, Joan Bruna, Alberto Bietti
ICLR 2025arXiv:2406.03068
8
citations
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona et al.
ICML 2024arXiv:2311.12997
15
citations