α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Neel Nanda
Neel Nanda
8
papers
514
total citations
papers (8)
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
ICLR 2024
arXiv
185
citations
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
ICLR 2025
arXiv
85
citations
Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control
ICLR 2025
arXiv
65
citations
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability
ICML 2025
arXiv
58
citations
Are Sparse Autoencoders Useful? A Case Study in Sparse Probing
ICML 2025
arXiv
56
citations
Sparse Autoencoders Do Not Find Canonical Units of Analysis
ICLR 2025
arXiv
41
citations
Explorations of Self-Repair in Language Models
ICML 2024
arXiv
19
citations
Scaling Sparse Feature Circuits For Studying In-Context Learning
ICML 2025
5
citations