"sparse attention" Papers
23 papers found
Conference
Adaptive Computation Pruning for the Forgetting Transformer
Zhixuan Lin, Johan Obando-Ceron, Xu Owen He et al.
DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference
Chong Wu, Jiawang Cao, Renjie Xu et al.
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
Xunhao Lai, Jianqiao Lu, Yao Luo et al.
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Xiang Hu, Jiaqi Leng, Jun Zhao et al.
Inference-Time Hyper-Scaling with KV Cache Compression
Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot et al.
Kinetics: Rethinking Test-Time Scaling Law
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng et al.
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.
MagicPIG: LSH Sampling for Efficient LLM Generation
Zhuoming Chen, Ranajoy Sadhukhan, Zihao Ye et al.
MoBA: Mixture of Block Attention for Long-Context LLMs
Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.
Overcoming Long Context Limitations of State Space Models via Context Dependent Sparse Attention
Zhihao Zhan, Jianan Zhao, Zhaocheng Zhu et al.
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim, Byeongsu Sim
Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape
Ruichen Chen, Keith Mills, Liyao Jiang et al.
SALS: Sparse Attention in Latent Space for KV Cache Compression
Junlin Mu, Hantao Huang, Jihang Zhang et al.
SCENT: Robust Spatiotemporal Learning for Continuous Scientific Data via Scalable Conditioned Neural Fields
David K Park, Xihaier Luo, Guang Zhao et al.
Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention
Chong You, Kan Wu, Zhipeng Jia et al.
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Shuo Yang, Haocheng Xi, Yilong Zhao et al.
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.
Training-free and Adaptive Sparse Attention for Efficient Long Video Generation
yifei xia, Suhan Ling, Fangcheng Fu et al.
Transformers Learn Faster with Semantic Focus
Parikshit Ram, Kenneth Clarkson, Tim Klinger et al.
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
Chaofan Lin, Jiaming Tang, Shuo Yang et al.
VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting
Junhyeok Kang, Yooju Shin, Jae-Gil Lee
Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
Huicong Zhang, Haozhe Xie, Hongxun Yao
MultiMax: Sparse and Multi-Modal Attention Learning
Yuxuan Zhou, Mario Fritz, Margret Keuper