"post-training quantization" Papers

39 papers found

ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization

Weibo Zhao, Yubin Shi, Xinyu Lyu et al.

AAAI 2025paperarXiv:2411.07762
3
citations

Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression

Kyo Kuroki, Yasuyuki Okoshi, Thiem Van Chu et al.

NEURIPS 2025arXiv:2510.18650

Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales

Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram et al.

CVPR 2025arXiv:2412.19867

DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization

Dongyeun Lee, jiwan hur, Hyounguk Shon et al.

ICCV 2025arXiv:2507.12933
2
citations

ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality

Mingzhi Zhu, Ding Shang, Sai Qian Zhang

NEURIPS 2025arXiv:2510.24787

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.

CVPR 2025highlightarXiv:2506.11543
6
citations

GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers

Guang Liang, Xinyao Liu, Jianxin Wu

NEURIPS 2025arXiv:2506.11784
4
citations

HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs

Ningning Chen, Weicai Ye, Ying Jiang

NEURIPS 2025spotlightarXiv:2512.00862
1
citations

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Xi Zhang, Xiaolin Wu, Jiamang Wang et al.

NEURIPS 2025arXiv:2510.20984

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Shiyao Li, Yingchun Hu, Xuefei Ning et al.

CVPR 2025arXiv:2412.19509
15
citations

OuroMamba: A Data-Free Quantization Framework for Vision Mamba

Akshat Ramachandran, Mingyu Lee, Huan Xu et al.

ICCV 2025arXiv:2503.10959
4
citations

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

Zhu Li Bo, Jianze Li, Haotong Qin et al.

CVPR 2025arXiv:2411.17106
10
citations

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Maosen Zhao, Pengtao Chen, Chong Yu et al.

CVPR 2025arXiv:2505.21591
3
citations

QERA: an Analytical Framework for Quantization Error Reconstruction

Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.

ICLR 2025arXiv:2410.06040
11
citations

Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment

DEOKJAE LEE, Hyun Oh Song

NEURIPS 2025arXiv:2509.20214

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Yamato Arai, Yuma Ichikawa

NEURIPS 2025arXiv:2504.09629
11
citations

Scaling Laws for Precision

Tanishq Kumar, Zachary Ankner, Benjamin Spector et al.

ICLR 2025arXiv:2411.04330
68
citations

SpinQuant: LLM Quantization with Learned Rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov et al.

ICLR 2025arXiv:2405.16406
268
citations

Surprising Effectiveness of pretraining Ternary Language Model at Scale

Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.

ICLR 2025arXiv:2407.12327
13
citations

SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models

Muyang Li, Yujun Lin, Zhekai Zhang et al.

ICLR 2025arXiv:2411.05007
98
citations

TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models

Haocheng Huang, Jiaxin Chen, Jinyang Guo et al.

AAAI 2025paperarXiv:2412.16700
3
citations

TinySAM: Pushing the Envelope for Efficient Segment Anything Model

Han Shu, Wenshuo Li, Yehui Tang et al.

AAAI 2025paperarXiv:2312.13789
41
citations

VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers

Qinkai XU, yijin liu, YangChen et al.

NEURIPS 2025oral

ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

ICCV 2025arXiv:2503.09509

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

AAAI 2025paperarXiv:2408.17131
11
citations

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, SangLyul Cho et al.

ICML 2024arXiv:2402.10517
43
citations

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Wei Huang, Yangdong Liu, Haotong Qin et al.

ICML 2024arXiv:2402.04291
142
citations

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Yunshan Zhong, Jiawei Hu, You Huang et al.

ICML 2024spotlight

Evaluating Quantized Large Language Models

Shiyao Li, Xuefei Ning, Luning Wang et al.

ICML 2024arXiv:2402.18158
83
citations

Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation

Zhewei Yao, Xiaoxia Wu, Cheng Li et al.

AAAI 2024paperarXiv:2303.08302
71
citations

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024arXiv:2403.06082
16
citations

Instance-Aware Group Quantization for Vision Transformers

Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.

CVPR 2024arXiv:2404.00928
15
citations

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Cheng Zhang, Jianyi Cheng, George Constantinides et al.

ICML 2024arXiv:2402.02446
27
citations

Make RepVGG Greater Again: A Quantization-Aware Approach

Xuesong Nie, Yunfeng Yan, Siyuan Li et al.

AAAI 2024paperarXiv:2212.01593
66
citations

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.

ECCV 2024arXiv:2405.17873
37
citations

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICML 2024

PB-LLM: Partially Binarized Large Language Models

Zhihang Yuan, Yuzhang Shang, Zhen Dong

ICLR 2024arXiv:2310.00034
82
citations

QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks

Albert Tseng, Jerry Chee, Qingyao Sun et al.

ICML 2024arXiv:2402.04396
241
citations

SqueezeLLM: Dense-and-Sparse Quantization

Sehoon Kim, Coleman Hooper, Amir Gholaminejad et al.

ICML 2024arXiv:2306.07629
272
citations