Poster "memory bottleneck" Papers
2 papers found
Conference
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
Xiang Liu, Zhenheng Tang, Peijie Dong et al.
NEURIPS 2025arXiv:2502.00299
16
citations
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang et al.
ICML 2024arXiv:2402.09398
79
citations