by Amir Abdi Papers
2 papers found
Conference
MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention
Yucheng Li, Huiqiang Jiang, Chengruidong Zhang et al.
ICML 2025oral
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
Yucheng Li, Huiqiang Jiang, Qianhui Wu et al.
ICLR 2025arXiv:2412.10319
38
citations