"linear representation hypothesis" Papers
3 papers found
Conference
LLM Unlearning via Neural Activation Redirection
William Shen, Xinchi Qiu, Meghdad Kurmanji et al.
NEURIPS 2025arXiv:2502.07218
16
citations
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
NEURIPS 2025spotlightarXiv:2507.08802
10
citations
The Linear Representation Hypothesis and the Geometry of Large Language Models
Kiho Park, Yo Joong Choe, Victor Veitch
ICML 2024arXiv:2311.03658
363
citations