"transformer efficiency" Papers
10 papers found
Conference
Attribution-Driven Adaptive Token Pruning for Transformers
YAOYAO YAN, Hui Yu, Weizhi Xu
NEURIPS 2025
Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency
Naoki Nishikawa, Rei Higuchi, Taiji Suzuki
NEURIPS 2025arXiv:2507.03340
1
citations
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction
Jeffrey Willette, Heejun Lee, Sung Ju Hwang
NEURIPS 2025arXiv:2505.11254
3
citations
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.
ICLR 2025arXiv:2405.14297
36
citations
FlashBias: Fast Computation of Attention with Bias
Haixu Wu, Minghao Guo, Yuezhou Ma et al.
NEURIPS 2025arXiv:2505.12044
1
citations
Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation
Jiesong Liu, Xipeng Shen
NEURIPS 2025
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.
ICLR 2025arXiv:2410.05462
2
citations
ZeroS: Zero‑Sum Linear Attention for Efficient Transformers
Jiecheng Lu, Xu Han, Yan Sun et al.
NEURIPS 2025spotlightarXiv:2602.05230
DiJiang: Efficient Large Language Models through Compact Kernelization
Hanting Chen, Liuzhicheng Liuzhicheng, Xutao Wang et al.
ICML 2024arXiv:2403.19928
11
citations
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo, Xinghao Chen, Yehui Tang et al.
ICML 2024arXiv:2405.11582
34
citations