α
Research
Alpha Leak
Conferences
Topics
Top Authors
Rankings
Browse All
EN
中
Home
/
Authors
/
Martin Wattenberg
Martin Wattenberg
6
papers
1,139
total citations
papers (6)
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
NEURIPS 2023
arXiv
879
citations
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
ICML 2024
arXiv
165
citations
ICLR: In-Context Learning of Representations
ICLR 2025
arXiv
32
citations
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
ICML 2025
arXiv
32
citations
Shared Global and Local Geometry of Language Model Embeddings
COLM 2025
arXiv
16
citations
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
ICML 2024
arXiv
15
citations