"autoregressive decoding acceleration" Papers
2 papers found
Conference
Adaptive Draft-Verification for Efficient Large Language Model Decoding
Xukun Liu, Bowen Lei, Ruqi Zhang et al.
AAAI 2025paperarXiv:2407.12021
8
citations
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu, Peter Bailis, Ion Stoica et al.
ICML 2024arXiv:2402.02057
257
citations