"low-bit quantization" Papers

25 papers found

ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Weibo Zhao, Yubin Shi, Xinyu Lyu et al.

AAAI 2025paperarXiv:2411.07762
3
citations

Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression

Kyo Kuroki, Yasuyuki Okoshi, Thiem Van Chu et al.

NEURIPS 2025arXiv:2510.18650

CASP: Compression of Large Multimodal Models Based on Attention Sparsity

Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.

CVPR 2025highlightarXiv:2503.05936
4
citations

CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs

Gunho Park, Jeongin Bae, Byeongwook Kim et al.

NEURIPS 2025arXiv:2512.17970

DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization

Dongyeun Lee, jiwan hur, Hyounguk Shon et al.

ICCV 2025arXiv:2507.12933
2
citations

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.

CVPR 2025highlightarXiv:2506.11543
6
citations

GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers

Guang Liang, Xinyao Liu, Jianxin Wu

NEURIPS 2025arXiv:2506.11784
4
citations

MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Kanghyun Choi, Hyeyoon Lee, Dain Kwon et al.

AAAI 2025paperarXiv:2407.20021
7
citations

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Maosen Zhao, Pengtao Chen, Chong Yu et al.

CVPR 2025arXiv:2505.21591
3
citations

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Yamato Arai, Yuma Ichikawa

NEURIPS 2025arXiv:2504.09629
11
citations

RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models

Zukang Xu, Xing Hu, Qiang Wu et al.

NEURIPS 2025arXiv:2510.01240

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators

Rasoul Shafipour, David Harrison, Maxwell Horton et al.

ICLR 2025arXiv:2410.10714
2
citations

Split Adaptation for Pre-trained Vision Transformers

Lixu Wang, Bingqi Shang, Yi Li et al.

CVPR 2025arXiv:2503.00441
2
citations

VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers

Qinkai XU, yijin liu, YangChen et al.

NEURIPS 2025oral

ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

ICCV 2025arXiv:2503.09509

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Haotong Qin, Xudong Ma, Xingyu Zheng et al.

ICML 2024arXiv:2402.05445
74
citations

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, SangLyul Cho et al.

ICML 2024arXiv:2402.10517
43
citations

A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

Miao Cao, Lishun Wang, Huan Wang et al.

ECCV 2024arXiv:2407.21517
4
citations

BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization

Lancheng Zou, Wenqian Zhao, Shuo Yin et al.

ICML 2024

Extreme Compression of Large Language Models via Additive Quantization

Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.

ICML 2024arXiv:2401.06118
160
citations

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024arXiv:2403.06082
16
citations

GenQ: Quantization in Low Data Regimes with Generative Synthetic Data

YUHANG LI, Youngeun Kim, Donghyun Lee et al.

ECCV 2024arXiv:2312.05272
6
citations

PB-LLM: Partially Binarized Large Language Models

Zhihang Yuan, Yuzhang Shang, Zhen Dong

ICLR 2024arXiv:2310.00034
82
citations

Sharpness-Aware Data Generation for Zero-shot Quantization

Hoang Dung, Cuong Pham, Trung Le et al.

ICML 2024arXiv:2510.07018
6
citations

Towards Robust Full Low-bit Quantization of Super Resolution Networks

Denis Makhov, Irina Zhelavskaya, Ruslan Ostapets et al.

ECCV 2024
1
citations