"transformer efficiency" Papers

10 papers found

Attribution-Driven Adaptive Token Pruning for Transformers

YAOYAO YAN, Hui Yu, Weizhi Xu

NEURIPS 2025

Degrees of Freedom for Linear Attention: Distilling Softmax Attention with Optimal Feature Efficiency

Naoki Nishikawa, Rei Higuchi, Taiji Suzuki

NEURIPS 2025arXiv:2507.03340
1
citations

Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction

Jeffrey Willette, Heejun Lee, Sung Ju Hwang

NEURIPS 2025arXiv:2505.11254
3
citations

Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models

Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.

ICLR 2025arXiv:2405.14297
36
citations

FlashBias: Fast Computation of Attention with Bias

Haixu Wu, Minghao Guo, Yuezhou Ma et al.

NEURIPS 2025arXiv:2505.12044
1
citations

Fourier Token Merging: Understanding and Capitalizing Frequency Domain for Efficient Image Generation

Jiesong Liu, Xipeng Shen

NEURIPS 2025

LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions

Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.

ICLR 2025arXiv:2410.05462
2
citations

ZeroS: Zero‑Sum Linear Attention for Efficient Transformers

Jiecheng Lu, Xu Han, Yan Sun et al.

NEURIPS 2025spotlightarXiv:2602.05230

DiJiang: Efficient Large Language Models through Compact Kernelization

Hanting Chen, Liuzhicheng Liuzhicheng, Xutao Wang et al.

ICML 2024arXiv:2403.19928
11
citations

SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization

Jialong Guo, Xinghao Chen, Yehui Tang et al.

ICML 2024arXiv:2405.11582
34
citations