"weight quantization" Papers
11 papers found
Conference
Cauchy-Schwarz Regularizers
Sueda Taner, Ziyi Wang, Christoph Studer
ICLR 2025arXiv:2503.01639
Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression
Xi Zhang, Xiaolin Wu, Jiamang Wang et al.
NEURIPS 2025arXiv:2510.20984
S$^2$NN: Sub-bit Spiking Neural Networks
Wenjie Wei, Malu Zhang, Jieyuan (Eric) Zhang et al.
NEURIPS 2025arXiv:2509.24266
Training-Free Activation Sparsity in Large Language Models
James Liu, Pragaash Ponnusamy, Tianle Cai et al.
ICLR 2025arXiv:2408.14690
39
citations
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
AAAI 2025paperarXiv:2408.17131
11
citations
A2Q+: Improving Accumulator-Aware Weight Quantization
Ian Colbert, Alessandro Pappalardo, Jakoba Petri-Koenig et al.
ICML 2024arXiv:2401.10432
10
citations
ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
Yunshan Zhong, Jiawei Hu, You Huang et al.
ICML 2024spotlight
ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking
Wenshuo Li, Xinghao Chen, Han Shu et al.
ICML 2024arXiv:2406.11257
9
citations
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
AAAI 2024paperarXiv:2303.08302
71
citations
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
ICML 2024arXiv:2401.06118
160
citations
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
AAAI 2024paperarXiv:2306.02272
105
citations