Poster "post-training quantization" Papers

29 papers found

Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression

Kyo Kuroki, Yasuyuki Okoshi, Thiem Van Chu et al.

NEURIPS 2025arXiv:2510.18650

Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales

Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram et al.

CVPR 2025arXiv:2412.19867

DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization

Dongyeun Lee, jiwan hur, Hyounguk Shon et al.

ICCV 2025arXiv:2507.12933
2
citations

ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality

Mingzhi Zhu, Ding Shang, Sai Qian Zhang

NEURIPS 2025arXiv:2510.24787

GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers

Guang Liang, Xinyao Liu, Jianxin Wu

NEURIPS 2025arXiv:2506.11784
4
citations

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Xi Zhang, Xiaolin Wu, Jiamang Wang et al.

NEURIPS 2025arXiv:2510.20984

MBQ: Modality-Balanced Quantization for Large Vision-Language Models

Shiyao Li, Yingchun Hu, Xuefei Ning et al.

CVPR 2025arXiv:2412.19509
15
citations

OuroMamba: A Data-Free Quantization Framework for Vision Mamba

Akshat Ramachandran, Mingyu Lee, Huan Xu et al.

ICCV 2025arXiv:2503.10959
4
citations

PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution

Zhu Li Bo, Jianze Li, Haotong Qin et al.

CVPR 2025arXiv:2411.17106
10
citations

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Maosen Zhao, Pengtao Chen, Chong Yu et al.

CVPR 2025arXiv:2505.21591
3
citations

QERA: an Analytical Framework for Quantization Error Reconstruction

Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.

ICLR 2025arXiv:2410.06040
11
citations

Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment

DEOKJAE LEE, Hyun Oh Song

NEURIPS 2025arXiv:2509.20214

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Yamato Arai, Yuma Ichikawa

NEURIPS 2025arXiv:2504.09629
11
citations

Scaling Laws for Precision

Tanishq Kumar, Zachary Ankner, Benjamin Spector et al.

ICLR 2025arXiv:2411.04330
68
citations

SpinQuant: LLM Quantization with Learned Rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov et al.

ICLR 2025arXiv:2405.16406
268
citations

Surprising Effectiveness of pretraining Ternary Language Model at Scale

Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.

ICLR 2025arXiv:2407.12327
13
citations

SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models

Muyang Li, Yujun Lin, Zhekai Zhang et al.

ICLR 2025arXiv:2411.05007
98
citations

ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba

Juncan Deng, Shuaiting Li, Zeyu Wang et al.

ICCV 2025arXiv:2503.09509

Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Yeonhong Park, Jake Hyun, SangLyul Cho et al.

ICML 2024arXiv:2402.10517
43
citations

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Wei Huang, Yangdong Liu, Haotong Qin et al.

ICML 2024arXiv:2402.04291
142
citations

Evaluating Quantized Large Language Models

Shiyao Li, Xuefei Ning, Luning Wang et al.

ICML 2024arXiv:2402.18158
83
citations

FrameQuant: Flexible Low-Bit Quantization for Transformers

Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.

ICML 2024arXiv:2403.06082
16
citations

Instance-Aware Group Quantization for Vision Transformers

Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.

CVPR 2024arXiv:2404.00928
15
citations

LQER: Low-Rank Quantization Error Reconstruction for LLMs

Cheng Zhang, Jianyi Cheng, George Constantinides et al.

ICML 2024arXiv:2402.02446
27
citations

MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization

Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.

ECCV 2024arXiv:2405.17873
37
citations

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICML 2024

PB-LLM: Partially Binarized Large Language Models

Zhihang Yuan, Yuzhang Shang, Zhen Dong

ICLR 2024arXiv:2310.00034
82
citations

QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks

Albert Tseng, Jerry Chee, Qingyao Sun et al.

ICML 2024arXiv:2402.04396
241
citations

SqueezeLLM: Dense-and-Sparse Quantization

Sehoon Kim, Coleman Hooper, Amir Gholaminejad et al.

ICML 2024arXiv:2306.07629
272
citations