Poster "llm inference optimization" Papers
7 papers found
Conference
Accurate KV Cache Eviction via Anchor Direction Projection for Efficient LLM Inference
Zijie Geng, Jie Wang, Ziqi Liu et al.
NEURIPS 2025
ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive
Xinhao Luo, Zihan Liu, Yangjie Zhou et al.
NEURIPS 2025arXiv:2508.18850
2
citations
DynaPipe: Dynamic Layer Redistribution for Efficient Serving of LLMs with Pipeline Parallelism
HongXin Xu, Tianyu Guo, Xianwei Zhang
NEURIPS 2025
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park, Dalton Jones, Matthew Morse et al.
NEURIPS 2025arXiv:2504.15364
17
citations
Learned Prefix Caching for Efficient LLM Inference
Dongsheng Yang, Austin Li, Kai Li et al.
NEURIPS 2025
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica et al.
ICML 2024arXiv:2402.02057
257
citations
CaM: Cache Merging for Memory-efficient LLMs Inference
Yuxin Zhang, Yuxuan Du, Gen Luo et al.
ICML 2024