Poster "polysemantic neurons" Papers
3 papers found
Conference
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
Qi Zhang, Yifei Wang, Jingyi Cui et al.
ICLR 2025arXiv:2410.21331
4
citations
Revising and Falsifying Sparse Autoencoder Feature Explanations
George Ma, Samuel Pfrommer, Somayeh Sojoudi
NEURIPS 2025
AND: Audio Network Dissection for Interpreting Deep Acoustic Models
Tung-Yu Wu, Yu-Xiang Lin, Lily Weng
ICML 2024arXiv:2406.16990
3
citations