Yonatan Belinkov

Affiliations

Technion - Israel Institute of Technology

papers

5,157

total citations

papers (15)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

ICLR 2025arXiv

2,226

citations

Locating and Editing Factual Associations in GPT

NEURIPS 2022arXiv

2,069

citations

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

ICLR 2025arXiv

263

citations

Linearity of Relation Decoding in Transformer Language Models

ICLR 2024arXiv

143

citations

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

ICLR 2025arXiv

129

citations

Editing Implicit Assumptions in Text-to-Image Diffusion Models

ICCV 2023arXiv

121

citations

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

ICLR 2024arXiv

citations

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

ICLR 2025arXiv

citations

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

COLM 2025arXiv

citations

Unsupervised Translation of Emergent Communication

AAAI 2025arXiv

citations

Investigating Gender Bias in Language Models Using Causal Mediation Analysis

NEURIPS 2020

citations

IRM—when it works and when it doesn't: A test case of natural language inference

NEURIPS 2021

citations

Yonatan Belinkov

Affiliations

papers (15)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Locating and Editing Factual Associations in GPT

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Linearity of Relation Decoding in Transformer Language Models

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

MIB: A Mechanistic Interpretability Benchmark

Measures of Information Reflect Memorization Patterns

Accelerating the Global Aggregation of Local Explanations

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Unsupervised Translation of Emergent Communication

Investigating Gender Bias in Language Models Using Causal Mediation Analysis

IRM—when it works and when it doesn't: A test case of natural language inference

papers (15)

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Locating and Editing Factual Associations in GPT

Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

Linearity of Relation Decoding in Transformer Language Models

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations

Editing Implicit Assumptions in Text-to-Image Diffusion Models

Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking

Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics

MIB: A Mechanistic Interpretability Benchmark

Measures of Information Reflect Memorization Patterns

Accelerating the Global Aggregation of Local Explanations

Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs

Unsupervised Translation of Emergent Communication

Investigating Gender Bias in Language Models Using Causal Mediation Analysis

IRM—when it works and when it doesn't: A test case of natural language inference