Jacob Andreas

Affiliations

MicrosoftMIT

papers

1,438

total citations

papers (16)

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

NEURIPS 2023arXiv

142

citations

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

ICML 2024arXiv

101

citations

In-Context Language Learning: Architectures and Algorithms

ICML 2024arXiv

citations

Eliciting Human Preferences with Language Models

ICLR 2025arXiv

citations

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

ICLR 2025arXiv

citations

A Multimodal Automated Interpretability Agent

ICML 2024arXiv

citations

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning

ICML 2025arXiv

citations

The Consensus Game: Language Model Generation via Equilibrium Search

ICLR 2024arXiv

citations

FIND: A Function Description Benchmark for Evaluating Interpretability Methods

NEURIPS 2023arXiv

citations

Toward a Visual Concept Vocabulary for GAN Latent Space

ICCV 2021arXiv

citations

Learning Linear Attention in Polynomial Time

NEURIPS 2025arXiv

citations

Teachable Reinforcement Learning via Advice Distillation

NEURIPS 2021arXiv

citations

Jacob Andreas

Affiliations

papers (16)

Pre-Trained Language Models for Interactive Decision-Making

Compositional Explanations of Neurons

A Benchmark for Systematic Generalization in Grounded Language Understanding

Linearity of Relation Decoding in Transformer Language Models

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

In-Context Language Learning: Architectures and Algorithms

Eliciting Human Preferences with Language Models

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

A Multimodal Automated Interpretability Agent

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning

The Consensus Game: Language Model Generation via Equilibrium Search

FIND: A Function Description Benchmark for Evaluating Interpretability Methods

Toward a Visual Concept Vocabulary for GAN Latent Space

Learning Linear Attention in Polynomial Time

Teachable Reinforcement Learning via Advice Distillation

papers (16)

Pre-Trained Language Models for Interactive Decision-Making

Compositional Explanations of Neurons

A Benchmark for Systematic Generalization in Grounded Language Understanding

Linearity of Relation Decoding in Transformer Language Models

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

In-Context Language Learning: Architectures and Algorithms

Eliciting Human Preferences with Language Models

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

A Multimodal Automated Interpretability Agent

The Surprising Effectiveness of Test-Time Training for Few-Shot Learning

The Consensus Game: Language Model Generation via Equilibrium Search

FIND: A Function Description Benchmark for Evaluating Interpretability Methods

Toward a Visual Concept Vocabulary for GAN Latent Space

Learning Linear Attention in Polynomial Time

Teachable Reinforcement Learning via Advice Distillation