"sparse dictionary learning" Papers
2 papers found
Conference
Monet: Mixture of Monosemantic Experts for Transformers
Jungwoo Park, Young Jin Ahn, Kee-Eung Kim et al.
ICLR 2025arXiv:2412.04139
9
citations
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
Aleksandar Makelov, Georg Lange, Neel Nanda
ICLR 2025arXiv:2405.08366
65
citations