"transformer architecture" Papers
335 papers found • Page 4 of 7
Conference
S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation
Junlang Huang, Chen Hao, Li Luo et al.
SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer
Chunnan Shang, Zhizhong Wang, Hongwei Wang et al.
Selective Attention Improves Transformer
Yaniv Leviathan, Matan Kalman, Yossi Matias
Sequence Complementor: Complementing Transformers for Time Series Forecasting with Learnable Sequences
Xiwen Chen, Peijie Qiu, Wenhui Zhu et al.
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
Chenyang Le, Bing Han, Jinshun Li et al.
Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention
Chong You, Kan Wu, Zhipeng Jia et al.
Spatial-Temporal Knowledge Distillation for Takeaway Recommendation
Shuyuan Zhao, Wei Chen, Boyan Shi et al.
SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
Yaniv Benny, Lior Wolf
Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation
Zhenxin Lei, Man Yao, Jiakui Hu et al.
Spiking Neural Networks Need High-Frequency Information
Yuetong Fang, Deming Zhou, Ziqing Wang et al.
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
SpinQuant: LLM Quantization with Learned Rotations
Zechun Liu, Changsheng Zhao, Igor Fedorov et al.
StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training
Ziming Liu, Shaoyu Wang, Shenggan Cheng et al.
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking
Sicheng Shen, Dongcheng Zhao, Linghao Feng et al.
STORM: Spatio-TempOral Reconstruction Model For Large-Scale Outdoor Scenes
Jiawei Yang, Jiahui Huang, Boris Ivanovic et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance
Hongyu Yan, Zijun Li, Kunming Luo et al.
Systematic Outliers in Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.
TAMER: Tree-Aware Transformer for Handwritten Mathematical Expression Recognition
Jianhua Zhu, Wenqi Zhao, Yu Li et al.
TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE
Jiawen Wei, jiang lan, Pengbo Wei et al.
Task Descriptors Help Transformers Learn Linear Models In-Context
Ruomin Huang, Rong Ge
Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context
Taejong Joo, Diego Klabjan
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.
TimeCHEAT: A Channel Harmony Strategy for Irregularly Sampled Multivariate Time Series Analysis
Jiexi Liu, Meng Cao, Songcan Chen
Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding
Ebrahim Feghhi, Shreyas Kaasyap, Nima Hadidi et al.
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
Ziyang Wu, Tianjiao Ding, Yifu Lu et al.
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.
Towards Provable Emergence of In-Context Reinforcement Learning
Jiuqi Wang, Rohan Chandra, Shangtong Zhang
Transformer brain encoders explain human high-level visual responses
Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte
Transformer Learns Optimal Variable Selection in Group-Sparse Classification
Chenyang Zhang, Xuran Meng, Yuan Cao
Transformers are almost optimal metalearners for linear classification
Roey Magen, Gal Vardi
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.
Transformers Handle Endogeneity in In-Context Linear Regression
Haodong Liang, Krishna Balasubramanian, Lifeng Lai
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
Yu Huang, Zixin Wen, Aarti Singh et al.
Transformers Struggle to Learn to Search
Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.
Transformers without Normalization
Jiachen Zhu, Xinlei Chen, Kaiming He et al.
TransMLA: Migrating GQA Models to MLA with Full DeepSeek Compatibility and Speedup
Fanxu Meng, Pingzhi Tang, Zengwei Yao et al.
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.
Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
Xingxuan Zhang, Haoran Wang, Jiansheng Li et al.
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
UniMotion: A Unified Motion Framework for Simulation, Prediction and Planning
Nan Song, Junzhe Jiang, jingyu li et al.
Universal Few-shot Spatial Control for Diffusion Models
Kiet Nguyen, Chanhyuk Lee, Donggyun Kim et al.
Unlabeled Data Can Provably Enhance In-Context Learning of Transformers
Renpu Liu, Jing Yang
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Qian Ma, Ruoxiang Xu, Yongqiang Cai
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh