Poster "llm inference acceleration" Papers
5 papers found
Conference
EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization
Yize Wu, KE GAO, Ling Li et al.
NEURIPS 2025arXiv:2502.02493
1
citations
MUSTAFAR: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference
Donghyeon Joo, Helya Hosseini, Ramyad Hadidi et al.
NEURIPS 2025arXiv:2505.22913
2
citations
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
ICLR 2025arXiv:2410.06916
41
citations
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Minsik Cho, Mohammad Rastegari, Devang Naik
ICML 2024arXiv:2405.05329
11
citations
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng et al.
ICML 2024arXiv:2401.10774
549
citations