Treasures in Discarded Weights for LLM Quantization

1citations

PDF Project

citations

#1733

in AAAI 2025

of 3028 papers

Top Authors

Data Points

Top Authors

Hao Yu Yang Zhou Bohua Chen Zelan Yang Shen Li Yong Li Jianxin Wu

Abstract

In recent years, large language models (LLMs) have developed rapidly and revolutionized natural language processing. However, high storage overhead and computing costs limit LLM deployment in resource-constrained environments. Quantization algorithms can effectively compress LLMs and accelerate inference, but they lead to loss in precision, especially in low-bit scenarios. In this paper, we find that the discarded weight values caused by quantization in fact contain treasures to improve LLMs' accuracy. To excavate those hidden treasures, we construct search spaces around these discarded weights and those weights within the search space can seamlessly be incorporated into the original quantization weights. To determine which weights should be merged, we design a plug-and-play weight compensation framework to capture global information and keep the weights with the highest potential benefits. Our framework can be combined with various LLM quantization algorithms to achieve higher precision without additional inference overhead. We validate the effectiveness of our approach on widely used benchmark datasets for LLMs.

Citation History

Jan 27, 2026

Feb 4, 2026

1+1