"transformer acceleration" Papers
3 papers found
Conference
DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference
Chong Wu, Jiawang Cao, Renjie Xu et al.
NEURIPS 2025
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval
Di Liu, Meng Chen, Baotong Lu et al.
NEURIPS 2025arXiv:2409.10516
90
citations
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang, Bhishma Dedhia, Niraj Jha
CVPR 2024arXiv:2305.17328
61
citations