Beidi Chen

papers

1,408

total citations

papers (17)

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer

NEURIPS 2023arXiv

105

citations

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

ICML 2024arXiv

citations

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

ICML 2025arXiv

citations

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

ICLR 2024arXiv

citations

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment

ICML 2024arXiv

citations

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

NEURIPS 2023arXiv

citations

LoCoCo: Dropping In Convolutions for Long Context Compression

ICML 2024arXiv

citations

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

ICML 2025arXiv

citations

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity

ICLR 2025

citations

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

NEURIPS 2022

citations

Soft Prompt Recovers Compressed LLMs, Transferably

ICML 2024

citations

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

NEURIPS 2023

citations

Locality Sensitive Teaching

NEURIPS 2021

citations

Beidi Chen

papers (17)

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Scatterbrain: Unifying Sparse and Low-rank Attention

Decentralized Training of Foundation Models in Heterogeneous Environments

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

LoCoCo: Dropping In Convolutions for Long Context Compression

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

Soft Prompt Recovers Compressed LLMs, Transferably

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Locality Sensitive Teaching

papers (17)

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Scatterbrain: Unifying Sparse and Low-rank Attention

Decentralized Training of Foundation Models in Heterogeneous Environments

Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention

HexGen: Generative Inference of Large Language Model over Heterogeneous Environment

Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions

LoCoCo: Dropping In Convolutions for Long Context Compression

Speculative Prefill: Turbocharging TTFT with Lightweight and Training-Free Token Importance Estimation

Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

Soft Prompt Recovers Compressed LLMs, Transferably

H2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

Locality Sensitive Teaching