Dan Alistarh

papers

1,015

total citations

papers (23)

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

NEURIPS 2022arXiv

340

citations

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

AAAI 2025arXiv

citations

Layer-wise Quantization for Quantized Optimistic Dual Averaging

ICML 2025arXiv

citations

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

ICLR 2025arXiv

citations

ZipLM: Inference-Aware Structured Pruning of Language Models

NEURIPS 2023

citations

Scalable Belief Propagation via Relaxed Scheduling

NEURIPS 2020

citations

Dan Alistarh

papers (23)

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

Extreme Compression of Large Language Models via Additive Quantization

Adaptive Gradient Quantization for Data-Parallel SGD

AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Asynchronous Decentralized SGD with Quantized and Local Updates

How Well Do Sparse ImageNet Models Transfer?

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

Distributed Principal Component Analysis with Limited Communication

Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures

Knowledge Distillation Performs Partial Variance Reduction

Towards Tight Communication Lower Bounds for Distributed Optimisation

Error Feedback Can Accurately Compress Preconditioners

Wasserstein Distances, Neuronal Entanglement, and Sparsity

Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

SPADE: Sparsity-Guided Debugging for Deep Neural Networks

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

Layer-wise Quantization for Quantized Optimistic Dual Averaging

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

ZipLM: Inference-Aware Structured Pruning of Language Models

Scalable Belief Propagation via Relaxed Scheduling

papers (23)

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning

Extreme Compression of Large Language Models via Additive Quantization

Adaptive Gradient Quantization for Data-Parallel SGD

AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks

M-FAC: Efficient Matrix-Free Approximations of Second-Order Information

Asynchronous Decentralized SGD with Quantized and Local Updates

How Well Do Sparse ImageNet Models Transfer?

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression

CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models

Distributed Principal Component Analysis with Limited Communication

Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures

Knowledge Distillation Performs Partial Variance Reduction

Towards Tight Communication Lower Bounds for Distributed Optimisation

Error Feedback Can Accurately Compress Preconditioners

Wasserstein Distances, Neuronal Entanglement, and Sparsity

Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models

SPADE: Sparsity-Guided Debugging for Deep Neural Networks

Hybrid Decentralized Optimization: Leveraging Both First- and Zeroth-Order Optimizers for Faster Convergence

Layer-wise Quantization for Quantized Optimistic Dual Averaging

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

ZipLM: Inference-Aware Structured Pruning of Language Models

Scalable Belief Propagation via Relaxed Scheduling