"transformer architecture" Papers

335 papers found • Page 4 of 7

S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation

Junlang Huang, Chen Hao, Li Luo et al.

NEURIPS 2025arXiv:2505.11843

SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer

Chunnan Shang, Zhizhong Wang, Hongwei Wang et al.

CVPR 2025highlightarXiv:2503.04119
1
citations

Selective Attention Improves Transformer

Yaniv Leviathan, Matan Kalman, Yossi Matias

ICLR 2025arXiv:2410.02703
21
citations

Sequence Complementor: Complementing Transformers for Time Series Forecasting with Learnable Sequences

Xiwen Chen, Peijie Qiu, Wenhui Zhu et al.

AAAI 2025paperarXiv:2501.02735
2
citations

Seurat: From Moving Points to Depth

Seokju Cho, Gabriel Huang, Seungryong Kim et al.

CVPR 2025highlightarXiv:2504.14687
9
citations

SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation

Chenyang Le, Bing Han, Jinshun Li et al.

NEURIPS 2025arXiv:2509.01200

Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention

Chong You, Kan Wu, Zhipeng Jia et al.

NEURIPS 2025
2
citations

Spatial-Temporal Knowledge Distillation for Takeaway Recommendation

Shuyuan Zhao, Wei Chen, Boyan Shi et al.

AAAI 2025paperarXiv:2412.16502
1
citations

SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception

Yaniv Benny, Lior Wolf

CVPR 2025arXiv:2412.06968
5
citations

Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation

Zhenxin Lei, Man Yao, Jiakui Hu et al.

AAAI 2025paperarXiv:2412.14587
14
citations

Spiking Neural Networks Need High-Frequency Information

Yuetong Fang, Deming Zhou, Ziqing Wang et al.

NEURIPS 2025arXiv:2505.18608
1
citations

SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition

Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.

ICCV 2025arXiv:2503.15986
1
citations

SpinQuant: LLM Quantization with Learned Rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov et al.

ICLR 2025arXiv:2405.16406
268
citations

StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training

Ziming Liu, Shaoyu Wang, Shenggan Cheng et al.

NEURIPS 2025arXiv:2407.00611
2
citations

STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking

Sicheng Shen, Dongcheng Zhao, Linghao Feng et al.

NEURIPS 2025oralarXiv:2505.11151
3
citations

STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes

Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.

ICLR 2025oralarXiv:2501.00602
25
citations

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025arXiv:2407.15811
26
citations

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025arXiv:2410.06916
41
citations

SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance

Hongyu Yan, Zijun Li, Kunming Luo et al.

AAAI 2025paperarXiv:2503.18007
12
citations

Systematic Outliers in Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

ICLR 2025arXiv:2502.06415
19
citations

TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models

Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.

ICCV 2025arXiv:2412.18675
1
citations

TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition

Jianhua Zhu, Wenqi Zhao, Yu Li et al.

AAAI 2025paperarXiv:2408.08578
8
citations

TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE

Jiawen Wei, jiang lan, Pengbo Wei et al.

NEURIPS 2025arXiv:2511.22853

Task Descriptors Help Transformers Learn Linear Models In-Context

Ruomin Huang, Rong Ge

ICLR 2025
3
citations

Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

Taejong Joo, Diego Klabjan

NEURIPS 2025arXiv:2502.04580

The emergence of sparse attention: impact of data distribution and benefits of repetition

Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.

NEURIPS 2025oralarXiv:2505.17863
7
citations

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.

ICLR 2025arXiv:2409.04431
39
citations

TimeCHEAT: A Channel Harmony Strategy for Irregularly Sampled Multivariate Time Series Analysis

Jiexi Liu, Meng Cao, Songcan Chen

AAAI 2025paperarXiv:2412.12886
9
citations

Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding

Ebrahim Feghhi, Shreyas Kaasyap, Nima Hadidi et al.

NEURIPS 2025arXiv:2507.02800
1
citations

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

Ziyang Wu, Tianjiao Ding, Yifu Lu et al.

ICLR 2025arXiv:2412.17810
28
citations

Towards Neural Scaling Laws for Time Series Foundation Models

Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.

ICLR 2025arXiv:2410.12360
26
citations

Towards Provable Emergence of In-Context Reinforcement Learning

Jiuqi Wang, Rohan Chandra, Shangtong Zhang

NEURIPS 2025oralarXiv:2509.18389
1
citations

Transformer brain encoders explain human high-level visual responses

Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte

NEURIPS 2025spotlightarXiv:2505.17329
5
citations

Transformer Learns Optimal Variable Selection in Group-Sparse Classification

Chenyang Zhang, Xuran Meng, Yuan Cao

ICLR 2025arXiv:2504.08638
4
citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NEURIPS 2025arXiv:2510.19797
1
citations

Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning

Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.

ICLR 2025oralarXiv:2405.13861
15
citations

Transformers Handle Endogeneity in In-Context Linear Regression

Haodong Liang, Krishna Balasubramanian, Lifeng Lai

ICLR 2025arXiv:2410.01265
4
citations

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025arXiv:2502.21212
22
citations

Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization

Yu Huang, Zixin Wen, Aarti Singh et al.

NEURIPS 2025arXiv:2511.07378
5
citations

Transformers Struggle to Learn to Search

Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.

ICLR 2025arXiv:2412.04703
15
citations

Transformers without Normalization

Jiachen Zhu, Xinlei Chen, Kaiming He et al.

CVPR 2025arXiv:2503.10622
103
citations

TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup

Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.

NEURIPS 2025spotlight

UFM: A Simple Path towards Unified Dense Correspondence with Flow

Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.

NEURIPS 2025arXiv:2506.09278
13
citations

Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study

Xingxuan Zhang, Haoran Wang, Jiansheng Li et al.

ICLR 2025arXiv:2503.15579
5
citations

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou et al.

CVPR 2025arXiv:2412.02699
16
citations

UniMotion: A Unified Motion Framework for Simulation, Prediction and Planning

Nan Song, Junzhe Jiang, jingyu li et al.

NEURIPS 2025oral

Universal Few-shot Spatial Control for Diffusion Models

Kiet Nguyen, Chanhyuk Lee, Donggyun Kim et al.

NEURIPS 2025arXiv:2509.07530

Unlabeled Data Can Provably Enhance In-Context Learning of Transformers

Renpu Liu, Jing Yang

NEURIPS 2025arXiv:2601.10058
1
citations

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Qian Ma, Ruoxiang Xu, Yongqiang Cai

NEURIPS 2025arXiv:2511.06376

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Weronika Ormaniec, Felix Dangel, Sidak Pal Singh

ICLR 2025arXiv:2410.10986
10
citations