"weight quantization" Papers

11 papers found

Cauchy-Schwarz Regularizers

Sueda Taner, Ziyi Wang, Christoph Studer

ICLR 2025arXiv:2503.01639

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Xi Zhang, Xiaolin Wu, Jiamang Wang et al.

NEURIPS 2025arXiv:2510.20984

S$^2$NN: Sub-bit Spiking Neural Networks

Wenjie Wei, Malu Zhang, Jieyuan (Eric) Zhang et al.

NEURIPS 2025arXiv:2509.24266

Training-Free Activation Sparsity in Large Language Models

James Liu, Pragaash Ponnusamy, Tianle Cai et al.

ICLR 2025arXiv:2408.14690
39
citations

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

AAAI 2025paperarXiv:2408.17131
11
citations

A2Q+: Improving Accumulator-Aware Weight Quantization

Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig et al.

ICML 2024arXiv:2401.10432
10
citations

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Yunshan Zhong, Jiawei Hu, You Huang et al.

ICML 2024spotlight

ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking

Wenshuo Li, Xinghao Chen, Han Shu et al.

ICML 2024arXiv:2406.11257
9
citations

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302
71
citations

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.

ICML 2024arXiv:2401.06118
160
citations

OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models

Changhun Lee, Jungyu Jin, Taesu Kim et al.

AAAI 2024paperarXiv:2306.02272
105
citations