Poster "post-training quantization" Papers
29 papers found
Conference
Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression
Kyo Kuroki, Yasuyuki Okoshi, Thiem Van Chu et al.
Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales
Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram et al.
DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization
Dongyeun Lee, jiwan hur, Hyounguk Shon et al.
ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality
Mingzhi Zhu, Ding Shang, Sai Qian Zhang
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Guang Liang, Xinyao Liu, Jianxin Wu
Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression
Xi Zhang, Xiaolin Wu, Jiamang Wang et al.
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Shiyao Li, Yingchun Hu, Xuefei Ning et al.
OuroMamba: A Data-Free Quantization Framework for Vision Mamba
Akshat Ramachandran, Mingyu Lee, Huan Xu et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
DEOKJAE LEE, Hyun Oh Song
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai, Yuma Ichikawa
Scaling Laws for Precision
Tanishq Kumar, Zachary Ankner, Benjamin Spector et al.
SpinQuant: LLM Quantization with Learned Rotations
Zechun Liu, Changsheng Zhao, Igor Fedorov et al.
Surprising Effectiveness of pretraining Ternary Language Model at Scale
Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park, Jake Hyun, SangLyul Cho et al.
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang, Yangdong Liu, Haotong Qin et al.
Evaluating Quantized Large Language Models
Shiyao Li, Xuefei Ning, Luning Wang et al.
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.
LQER: Low-Rank Quantization Error Reconstruction for LLMs
Cheng Zhang, Jianyi Cheng, George Constantinides et al.
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.
Outlier-aware Slicing for Post-Training Quantization in Vision Transformer
Yuexiao Ma, Huixia Li, Xiawu Zheng et al.
PB-LLM: Partially Binarized Large Language Models
Zhihang Yuan, Yuzhang Shang, Zhen Dong
QuIP$\#$: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks
Albert Tseng, Jerry Chee, Qingyao Sun et al.
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim, Coleman Hooper, Amir Gholaminejad et al.