"indirect object identification" Papers
3 papers found
Conference
Beyond Components: Singular Vector-Based Interpretability of Transformer Circuits
Areeb Ahmad, Abhinav Joshi, Ashutosh Modi
NEURIPS 2025arXiv:2511.20273
2
citations
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
NEURIPS 2025spotlightarXiv:2507.08802
10
citations
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov, Georg Lange, Neel Nanda
ICLR 2025arXiv:2405.08366
65
citations