"post-training quantization" Papers
39 papers found
Conference
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization
Weibo Zhao, Yubin Shi, Xinyu Lyu et al.
Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression
Kyo Kuroki, Yasuyuki Okoshi, Thiem Van Chu et al.
Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales
Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram et al.
DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization
Dongyeun Lee, jiwan hur, Hyounguk Shon et al.
ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality
Mingzhi Zhu, Ding Shang, Sai Qian Zhang
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Guang Liang, Xinyao Liu, Jianxin Wu
HBLLM: Wavelet-Enhanced High-Fidelity 1-Bit Quantization for LLMs
Ningning Chen, Weicai Ye, Ying Jiang
Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression
Xi Zhang, Xiaolin Wu, Jiamang Wang et al.
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Shiyao Li, Yingchun Hu, Xuefei Ning et al.
OuroMamba: A Data-Free Quantization Framework for Vision Mamba
Akshat Ramachandran, Mingyu Lee, Huan Xu et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
DEOKJAE LEE, Hyun Oh Song
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai, Yuma Ichikawa
Scaling Laws for Precision
Tanishq Kumar, Zachary Ankner, Benjamin Spector et al.
SpinQuant: LLM Quantization with Learned Rotations
Zechun Liu, Changsheng Zhao, Igor Fedorov et al.
Surprising Effectiveness of pretraining Ternary Language Model at Scale
Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
TCAQ-DM: Timestep-Channel Adaptive Quantization for Diffusion Models
Haocheng Huang, Jiaxin Chen, Jinyang Guo et al.
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu, Wenshuo Li, Yehui Tang et al.
VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers
Qinkai XU, yijin liu, YangChen et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park, Jake Hyun, SangLyul Cho et al.
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang, Yangdong Liu, Haotong Qin et al.
ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
Yunshan Zhong, Jiawei Hu, You Huang et al.
Evaluating Quantized Large Language Models
Shiyao Li, Xuefei Ning, Luning Wang et al.
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.
LQER: Low-Rank Quantization Error Reconstruction for LLMs
Cheng Zhang, Jianyi Cheng, George Constantinides et al.
Make RepVGG Greater Again: A Quantization-Aware Approach
Xuesong Nie, Yunfeng Yan, Siyuan Li et al.
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.
Outlier-aware Slicing for Post-Training Quantization in Vision Transformer
Yuexiao Ma, Huixia Li, Xiawu Zheng et al.
PB-LLM: Partially Binarized Large Language Models
Zhihang Yuan, Yuzhang Shang, Zhen Dong
QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng, Jerry Chee, Qingyao Sun et al.
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim, Coleman Hooper, Amir Gholaminejad et al.