Spotlight "sparse autoencoders" Papers
3 papers found
Conference
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
NEURIPS 2025spotlightarXiv:2504.04072
8
citations
SparseMVC: Probing Cross-view Sparsity Variations for Multi-view Clustering
Ruimeng Liu, Xin Zou, Chang Tang et al.
NEURIPS 2025spotlight
Transferring Linear Features Across Language Models With Model Stitching
Alan Chen, Jack Merullo, Alessandro Stolfo et al.
NEURIPS 2025spotlightarXiv:2506.06609
1
citations