Spotlight "attention mechanism" Papers

20 papers found

Absence Bench: Language Models Can’t See What’s Missing

Harvey Yiyun Fu, Aryan Shrivastava, Jared Moore et al.

NEURIPS 2025spotlight

Achilles' Heel of Mamba: Essential difficulties of the Mamba architecture demonstrated by synthetic data

Tianyi Chen, Pengxiao Lin, Zhiwei Wang et al.

NEURIPS 2025spotlightarXiv:2509.17514
1
citations

A Closer Look at Graph Transformers: Cross-Aggregation and Beyond

Jiaming Zhuo, Ziyi Ma, Yintong Lu et al.

NEURIPS 2025spotlight

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Yuang Ai, Qihang Fan, Xuefeng Hu et al.

NEURIPS 2025spotlightarXiv:2505.11196
1
citations

Enhancing Training Data Attribution with Representational Optimization

Weiwei Sun, Haokun Liu, Nikhil Kandpal et al.

NEURIPS 2025spotlightarXiv:2505.18513

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.

NEURIPS 2025spotlightarXiv:2503.18908
2
citations

Lost in Transmission: When and Why LLMs Fail to Reason Globally

Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan et al.

NEURIPS 2025spotlightarXiv:2505.08140
4
citations

MoBA: Mixture of Block Attention for Long-Context LLMs

Enzhe Lu, Zhejun Jiang, Jingyuan Liu et al.

NEURIPS 2025spotlightarXiv:2502.13189
109
citations

Polyline Path Masked Attention for Vision Transformer

Zhongchen Zhao, Chaodong Xiao, Hui LIN et al.

NEURIPS 2025spotlightarXiv:2506.15940

Quantum Doubly Stochastic Transformers

Jannis Born, Filip Skogh, Kahn Rhrissorrakrai et al.

NEURIPS 2025spotlightarXiv:2504.16275
2
citations

SparseMVC: Probing Cross-view Sparsity Variations for Multi-view Clustering

Ruimeng Liu, Xin Zou, Chang Tang et al.

NEURIPS 2025spotlight

Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation

Shuo Yang, Haocheng Xi, Yilong Zhao et al.

NEURIPS 2025spotlightarXiv:2505.18875
40
citations

Tensor Product Attention Is All You Need

Yifan Zhang, Yifeng Liu, Huizhuo Yuan et al.

NEURIPS 2025spotlightarXiv:2501.06425
34
citations

Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few

Qishuai Wen, Zhiyuan Huang, Chun-Guang Li

NEURIPS 2025spotlightarXiv:2509.16875
1
citations

Transformer brain encoders explain human high-level visual responses

Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte

NEURIPS 2025spotlightarXiv:2505.17329
5
citations

TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup

Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.

NEURIPS 2025spotlight

InterpreTabNet: Distilling Predictive Signals from Tabular Data by Salient Feature Interpretation

Jacob Si, Wendy Yusi Cheng, Michael Cooper et al.

ICML 2024spotlightarXiv:2406.00426
12
citations

Relaxing the Accurate Imputation Assumption in Doubly Robust Learning for Debiased Collaborative Filtering

Haoxuan Li, Chunyuan Zheng, Shuyi Wang et al.

ICML 2024spotlight

Simple linear attention language models balance the recall-throughput tradeoff

Simran Arora, Sabri Eyuboglu, Michael Zhang et al.

ICML 2024spotlightarXiv:2402.18668
140
citations

Sparse and Structured Hopfield Networks

Saúl Santos, Vlad Niculae, Daniel McNamee et al.

ICML 2024spotlightarXiv:2402.13725
12
citations