Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation

7citations

arXiv:2405.16504

citations

#1855

in ICLR 2025

of 3827 papers

Top Authors

Data Points

Top Authors

Itamar Zimerman ameen ali ali Lior Wolf

Abstract

Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs, all featuring sub-quadratic complexity in sequence length and excellent scaling properties, enabling the construction of a new type of foundation models. In this paper, we present a unified view of these models, formulating such layers as implicit causal self-attention layers. The formulation includes most of their sub-components and is not limited to a specific part of the architecture. The framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods. Our experiments show that our attention matrices and attribution method outperform an alternative and a more limited formulation that was recently proposed for Mamba. For the other architectures for which our method is the first to provide such a view, our method is effective and competitive in the relevant metrics compared to the results obtained by state-of-the-art Transformer explainability methods. Our code is publicly available.

Citation History

Jan 25, 2026

Jan 26, 2026

Jan 28, 2026

Feb 13, 2026

7+7

Feb 13, 2026