"kv cache quantization" Papers
4 papers found
Conference
PolarQuant: Leveraging Polar Transformation for Key Cache Quantization and Decoding Acceleration
Songhao Wu, Ang Lv, xiao feng et al.
NEURIPS 2025
SpinQuant: LLM Quantization with Learned Rotations
Zechun Liu, Changsheng Zhao, Igor Fedorov et al.
ICLR 2025arXiv:2405.16406
268
citations
Evaluating Quantized Large Language Models
Shiyao Li, Xuefei Ning, Luning Wang et al.
ICML 2024arXiv:2402.18158
83
citations
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Zirui Liu, Jiayi Yuan, Hongye Jin et al.
ICML 2024arXiv:2402.02750
368
citations