α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Yonatan Belinkov
Yonatan Belinkov
1
Affiliations
Affiliations
Technion - Israel Institute of Technology
15
papers
5,157
total citations
papers (15)
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
ICLR 2025
arXiv
2,226
citations
Locating and Editing Factual Associations in GPT
NEURIPS 2022
arXiv
2,069
citations
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
ICLR 2025
arXiv
263
citations
Linearity of Relation Decoding in Transformer Language Models
ICLR 2024
arXiv
143
citations
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations
ICLR 2025
arXiv
129
citations
Editing Implicit Assumptions in Text-to-Image Diffusion Models
ICCV 2023
arXiv
121
citations
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
ICLR 2024
arXiv
99
citations
Arithmetic Without Algorithms: Language Models Solve Math with a Bag of Heuristics
ICLR 2025
arXiv
70
citations
MIB: A Mechanistic Interpretability Benchmark
ICML 2025
arXiv
14
citations
Measures of Information Reflect Memorization Patterns
NEURIPS 2022
arXiv
12
citations
Accelerating the Global Aggregation of Local Explanations
AAAI 2024
arXiv
7
citations
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
COLM 2025
arXiv
3
citations
Unsupervised Translation of Emergent Communication
AAAI 2025
arXiv
1
citations
Investigating Gender Bias in Language Models Using Causal Mediation Analysis
NEURIPS 2020
0
citations
IRM—when it works and when it doesn't: A test case of natural language inference
NEURIPS 2021
0
citations