Poster "flashattention compatibility" Papers
3 papers found
Conference
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.
CVPR 2025arXiv:2504.08966
21
citations
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads
Hanlin Tang, Yang Lin, Jing Lin et al.
ICLR 2025arXiv:2407.15891
62
citations
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang, Yang Sui, Jinqi Xiao et al.
CVPR 2025arXiv:2503.18278
24
citations