Eran Malach

papers

707

total citations

papers (16)

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

NEURIPS 2022arXiv

164

citations

Let Me Think! A Long Chain of Thought Can Be Worth Exponentially Many Short Ones

NEURIPS 2025arXiv

citations

A Taxonomy of Transcendence

COLM 2025arXiv

citations

Pareto Frontiers in Deep Feature Learning: Data, Compute, Width, and Luck

NEURIPS 2023

citations

The Implications of Local Correlation on Learning Some Deep Functions

NEURIPS 2020

citations

Eran Malach

papers (16)

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Repeat After Me: Transformers are Better than State Space Models at Copying

Learning Parities with Neural Networks

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Auto-Regressive Next-Token Predictors are Universal Learners

A New Perspective on Shampoo's Preconditioner

On the Power of Differentiable Learning versus PAC and SQ Learning

Universal Length Generalization with Turing Programs

DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS

Knowledge Distillation: Bad Models Can Be Good Role Models

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Mixture of Parrots: Experts improve memorization more than reasoning

Let Me Think! A Long Chain of Thought Can Be Worth Exponentially Many Short Ones

A Taxonomy of Transcendence

Pareto Frontiers in Deep Feature Learning: Data, Compute, Width, and Luck

The Implications of Local Correlation on Learning Some Deep Functions

papers (16)

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Repeat After Me: Transformers are Better than State Space Models at Copying

Learning Parities with Neural Networks

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining

Auto-Regressive Next-Token Predictors are Universal Learners

A New Perspective on Shampoo's Preconditioner

On the Power of Differentiable Learning versus PAC and SQ Learning

Universal Length Generalization with Turing Programs

DON’T STOP ME NOW: EMBEDDING BASED SCHEDULING FOR LLMS

Knowledge Distillation: Bad Models Can Be Good Role Models

To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning

Mixture of Parrots: Experts improve memorization more than reasoning

Let Me Think! A Long Chain of Thought Can Be Worth Exponentially Many Short Ones

A Taxonomy of Transcendence

Pareto Frontiers in Deep Feature Learning: Data, Compute, Width, and Luck

The Implications of Local Correlation on Learning Some Deep Functions