"model quantization" Papers

24 papers found

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

Xingyu Zheng, Xianglong Liu, Haotong Qin et al.

ICLR 2025arXiv:2404.05662
7
citations

Catastrophic Failure of LLM Unlearning via Quantization

Zhiwei Zhang, Fali Wang, Xiaomin Li et al.

ICLR 2025arXiv:2410.16454
49
citations

DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models

Hyogon Ryu, NaHyeon Park, Hyunjung Shim

ICLR 2025arXiv:2501.04304
7
citations

From Attention to Activation: Unraveling the Enigmas of Large Language Models

Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.

ICLR 2025arXiv:2410.17174
8
citations

GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers

Guang Liang, Xinyao Liu, Jianxin Wu

NEURIPS 2025arXiv:2506.11784
4
citations

Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning

Maosen Zhao, Pengtao Chen, Chong Yu et al.

CVPR 2025arXiv:2505.21591
3
citations

Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels

Yujia Tong, Yuze Wang, Jingling Yuan et al.

ICCV 2025arXiv:2503.13917
6
citations

Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers

Yunshan Zhong, Yuyao Zhou, Yuxin Zhang et al.

ICCV 2025arXiv:2412.16553

SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models

Muyang Li, Yujun Lin, Zhekai Zhang et al.

ICLR 2025arXiv:2411.05007
98
citations

SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning

Minjun Kim, Jongjin Kim, U Kang

ICLR 2025
8
citations

Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs

Hao Kang, Qingru Zhang, Han Cai et al.

NEURIPS 2025spotlightarXiv:2505.19481
6
citations

Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation

Jialai Wang, Yuxiao Wu, Weiye Xu et al.

CVPR 2025
3
citations

Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

Haotong Qin, Xudong Ma, Xingyu Zheng et al.

ICML 2024arXiv:2402.05445
74
citations

BiE: Bi-Exponent Block Floating-Point for Large Language Models Quantization

Lancheng Zou, Wenqian Zhao, Shuo Yin et al.

ICML 2024

Data Poisoning Quantization Backdoor Attack

Tran Huynh, Anh Tran, Khoa Doan et al.

ECCV 2024
3
citations

Memory-Efficient Fine-Tuning for Quantized Diffusion Model

Hyogon Ryu, Seohyun Lim, Hyunjung Shim

ECCV 2024arXiv:2401.04339
8
citations

MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation

Shuzhao Xie, Weixiang Zhang, Chen Tang et al.

ECCV 2024arXiv:2409.09756
41
citations

MGit: A Model Versioning and Management System

Wei Hao, Daniel Mendoza, Rafael Mendes et al.

ICML 2024arXiv:2307.07507
1
citations

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICML 2024

PB-LLM: Partially Binarized Large Language Models

Zhihang Yuan, Yuzhang Shang, Zhen Dong

ICLR 2024arXiv:2310.00034
82
citations

Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approximation

Boheng Li, Yishuo Cai, Jisong Cai et al.

ICML 2024

Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities

Kaiwen Cai, ZheKai Duan, Gaowen Liu et al.

ECCV 2024arXiv:2403.04908
10
citations

SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

Liulu He, yufei zhao, rui gao et al.

ICML 2024arXiv:2407.02913
3
citations

Test-Time Model Adaptation with Only Forward Passes

Shuaicheng Niu, Chunyan Miao, Guohao Chen et al.

ICML 2024arXiv:2404.01650
60
citations