"quantization" Papers
5 papers found
Conference
KVSink: Understanding and Enhancing the Preservation of Attention Sinks in KV Cache Quantization for LLMs
Zunhai Su, Kehong Yuan
COLM 2025paperarXiv:2508.04257
8
citations
QSVD: Efficient Low-rank Approximation for Unified Query-Key-Value Weight Compression in Low-Precision Vision-Language Models
Yutong Wang, Haiyu Wang, Sai Qian Zhang
NEURIPS 2025spotlightarXiv:2510.16292
1
citations
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov, Kushal Tirumala, Hassan Shapourian et al.
ICLR 2025arXiv:2403.17887
172
citations
Compressing Large Language Models by Joint Sparsification and Quantization
Jinyang Guo, Jianyu Wu, Zining Wang et al.
ICML 2024
Fed-QSSL: A Framework for Personalized Federated Learning under Bitwidth and Data Heterogeneity
Yiyue Chen, Haris Vikalo, Chianing Wang
AAAI 2024paperarXiv:2312.13380
13
citations