Poster "llm efficiency" Papers
2 papers found
Conference
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Yuan Feng, Junlin Lv, Yukun Cao et al.
NEURIPS 2025arXiv:2407.11550
106
citations
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
Xiang Liu, Zhenheng Tang, Peijie Dong et al.
NEURIPS 2025arXiv:2502.00299
16
citations