"low-bit quantization" Papers
25 papers found
Conference
ASER: Activation Smoothing and Error Reconstruction for Large Language Model Quantization
Weibo Zhao, Yubin Shi, Xinyu Lyu et al.
Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression
Kyo Kuroki, Yasuyuki Okoshi, Thiem Van Chu et al.
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami, Mohammad Akbari, Kevin Cannons et al.
CodeGEMM: A Codebook-Centric Approach to Efficient GEMM in Quantized LLMs
Gunho Park, Jeongin Bae, Byeongwook Kim et al.
DMQ: Dissecting Outliers of Diffusion Models for Post-Training Quantization
Dongyeun Lee, jiwan hur, Hyounguk Shon et al.
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Guang Liang, Xinyao Liu, Jianxin Wu
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi, Hyeyoon Lee, Dain Kwon et al.
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai, Yuma Ichikawa
RSAVQ: Riemannian Sensitivity-Aware Vector Quantization for Large Language Models
Zukang Xu, Xing Hu, Qiang Wu et al.
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators
Rasoul Shafipour, David Harrison, Maxwell Horton et al.
Split Adaptation for Pre-trained Vision Transformers
Lixu Wang, Bingqi Shang, Yi Li et al.
VETA-DiT: Variance-Equalized and Temporally Adaptive Quantization for Efficient 4-bit Diffusion Transformers
Qinkai XU, yijin liu, YangChen et al.
ViM-VQ: Efficient Post-Training Vector Quantization for Visual Mamba
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Haotong Qin, Xudong Ma, Xingyu Zheng et al.
Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
Yeonhong Park, Jake Hyun, SangLyul Cho et al.
A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
Miao Cao, Lishun Wang, Huan Wang et al.
BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization
Lancheng Zou, Wenqian Zhao, Shuo Yin et al.
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
FrameQuant: Flexible Low-Bit Quantization for Transformers
Harshavardhan Adepu, Zhanpeng Zeng, Li Zhang et al.
GenQ: Quantization in Low Data Regimes with Generative Synthetic Data
YUHANG LI, Youngeun Kim, Donghyun Lee et al.
PB-LLM: Partially Binarized Large Language Models
Zhihang Yuan, Yuzhang Shang, Zhen Dong
Sharpness-Aware Data Generation for Zero-shot Quantization
Hoang Dung, Cuong Pham, Trung Le et al.
Towards Robust Full Low-bit Quantization of Super Resolution Networks
Denis Makhov, Irina Zhelavskaya, Ruslan Ostapets et al.