"key-value caching" Papers
4 papers found
Conference
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu, Meng Chen, Baotong Lu et al.
NEURIPS 2025arXiv:2409.10516
90
citations
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
Xun Huang, Zhengqi Li, Guande He et al.
NEURIPS 2025spotlightarXiv:2506.08009
145
citations
Training Free Exponential Context Extension via Cascading KV Cache
Jeff Willette, Heejun Lee, Youngwan Lee et al.
ICLR 2025arXiv:2406.17808
3
citations
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
Cunxiao Du, Jing Jiang, Xu Yuanchen et al.
ICML 2024arXiv:2402.02082
65
citations