Poster "large language model serving" Papers
2 papers found
Conference
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Ranajoy Sadhukhan, Jian Chen, Zhuoming Chen et al.
ICLR 2025arXiv:2408.11049
64
citations
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Zirui Liu, Jiayi Yuan, Hongye Jin et al.
ICML 2024arXiv:2402.02750
368
citations