"neuron interpretability" Papers
3 papers found
Conference
Monet: Mixture of Monosemantic Experts for Transformers
Jungwoo Park, Young Jin Ahn, Kee-Eung Kim et al.
ICLR 2025arXiv:2412.04139
9
citations
What should a neuron aim for? Designing local objective functions based on information theory
Andreas C. Schneider, Valentin Neuhaus, David Ehrlich et al.
ICLR 2025arXiv:2412.02482
5
citations
Linear Explanations for Individual Neurons
Tuomas Oikarinen, Lily Weng
ICML 2024arXiv:2405.06855
15
citations