"attention mechanism optimization" Papers
4 papers found
Conference
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao, Kaiqi Chen, Kexun Zhang et al.
ICLR 2025arXiv:2404.00242
9
citations
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
Chenlong Deng, Zhisong Zhang, Kelong Mao et al.
NEURIPS 2025arXiv:2509.15763
4
citations
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
Harry Dong, Xinyu Yang, Zhenyu Zhang et al.
ICML 2024arXiv:2402.09398
79
citations
MobileNetV4: Universal Models for the Mobile Ecosystem
Danfeng Qin, Chas Leichner, Manolis Delakis et al.
ECCV 2024arXiv:2404.10518
434
citations