Poster "llm inference acceleration" Papers

5 papers found

Filters:poster llm inference acceleration Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

EasySpec: Layer-Parallel Speculative Decoding for Efficient Multi-GPU Utilization

Yize Wu, KE GAO, Ling Li et al.

NEURIPS 2025arXiv:2502.02493

MUSTAFAR: Promoting Unstructured Sparsity for KV Cache Pruning in LLM Inference

Donghyeon Joo, Helya Hosseini, Ramyad Hadidi et al.

NEURIPS 2025arXiv:2505.22913

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025arXiv:2410.06916

KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation

Minsik Cho, Mohammad Rastegari, Devang Naik

ICML 2024arXiv:2405.05329

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

Tianle Cai, Yuhong Li, Zhengyang Geng et al.

ICML 2024arXiv:2401.10774