Poster "transformer architecture" Papers

257 papers found • Page 3 of 6

Principles of Visual Tokens for Efficient Video Understanding

Xinyue Hao, Li, Shreyank Gowda et al.

ICCV 2025arXiv:2411.13626
1
citations

PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction

Manahil Raza, Ayesha Azam, Talha Qaiser et al.

ICCV 2025arXiv:2509.20022
1
citations

RAPTR: Radar-based 3D Pose Estimation using Transformer

Sorachi Kato, Ryoma Yataka, Pu Wang et al.

NEURIPS 2025arXiv:2511.08387

RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion

Bardienus Duisterhof, Jan Oberst, Bowen Wen et al.

NEURIPS 2025arXiv:2506.05285
4
citations

Reconstructing Humans with a Biomechanically Accurate Skeleton

Yan Xia, Xiaowei Zhou, Etienne Vouga et al.

CVPR 2025arXiv:2503.21751
7
citations

Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians

Akiyoshi Tomihari, Ryo Karakida

NEURIPS 2025arXiv:2505.19458
2
citations

Revisiting Convolution Architecture in the Realm of DNA Foundation Models

Yu Bo, Weian Mao, Daniel Shao et al.

ICLR 2025arXiv:2502.18538
4
citations

REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

Jihyun Lee, Weipeng Xu, Alexander Richard et al.

CVPR 2025arXiv:2504.04956
9
citations

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

ICLR 2025arXiv:2402.18510
51
citations

Rope to Nope and Back Again: A New Hybrid Attention Strategy

Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.

NEURIPS 2025arXiv:2501.18795
20
citations

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu et al.

ICLR 2025arXiv:2408.00714
2393
citations

SAS: Simulated Attention Score

Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.

NEURIPS 2025arXiv:2507.07694
2
citations

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

Julian Parker, Anton Smirnov, Jordi Pons et al.

ICLR 2025arXiv:2411.19842
62
citations

S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation

Junlang Huang, Chen Hao, Li Luo et al.

NEURIPS 2025arXiv:2505.11843

Selective Attention Improves Transformer

Yaniv Leviathan, Matan Kalman, Yossi Matias

ICLR 2025arXiv:2410.02703
21
citations

SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation

Chenyang Le, Bing Han, Jinshun Li et al.

NEURIPS 2025arXiv:2509.01200

Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention

Chong You, Kan Wu, Zhipeng Jia et al.

NEURIPS 2025
2
citations

SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception

Yaniv Benny, Lior Wolf

CVPR 2025arXiv:2412.06968
5
citations

Spiking Neural Networks Need High-Frequency Information

Yuetong Fang, Deming Zhou, Ziqing Wang et al.

NEURIPS 2025arXiv:2505.18608
1
citations

SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition

Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.

ICCV 2025arXiv:2503.15986
1
citations

SpinQuant: LLM Quantization with Learned Rotations

Zechun Liu, Changsheng Zhao, Igor Fedorov et al.

ICLR 2025arXiv:2405.16406
268
citations

StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training

Ziming Liu, Shaoyu Wang, Shenggan Cheng et al.

NEURIPS 2025arXiv:2407.00611
2
citations

Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget

Vikash Sehwag, Xianghao Kong, Jingtao Li et al.

CVPR 2025arXiv:2407.15811
26
citations

SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration

Heming Xia, Yongqi Li, Jun Zhang et al.

ICLR 2025arXiv:2410.06916
41
citations

Systematic Outliers in Large Language Models

Yongqi An, Xu Zhao, Tao Yu et al.

ICLR 2025arXiv:2502.06415
19
citations

TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models

Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.

ICCV 2025arXiv:2412.18675
1
citations

TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE

Jiawen Wei, jiang lan, Pengbo Wei et al.

NEURIPS 2025arXiv:2511.22853

Task Descriptors Help Transformers Learn Linear Models In-Context

Ruomin Huang, Rong Ge

ICLR 2025
3
citations

Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

Taejong Joo, Diego Klabjan

NEURIPS 2025arXiv:2502.04580

Theory, Analysis, and Best Practices for Sigmoid Self-Attention

Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.

ICLR 2025arXiv:2409.04431
39
citations

Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding

Ebrahim Feghhi, Shreyas Kaasyap, Nima Hadidi et al.

NEURIPS 2025arXiv:2507.02800
1
citations

Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction

Ziyang Wu, Tianjiao Ding, Yifu Lu et al.

ICLR 2025arXiv:2412.17810
28
citations

Towards Neural Scaling Laws for Time Series Foundation Models

Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.

ICLR 2025arXiv:2410.12360
26
citations

Transformer Learns Optimal Variable Selection in Group-Sparse Classification

Chenyang Zhang, Xuran Meng, Yuan Cao

ICLR 2025arXiv:2504.08638
4
citations

Transformers are almost optimal metalearners for linear classification

Roey Magen, Gal Vardi

NEURIPS 2025arXiv:2510.19797
1
citations

Transformers Handle Endogeneity in In-Context Linear Regression

Haodong Liang, Krishna Balasubramanian, Lifeng Lai

ICLR 2025arXiv:2410.01265
4
citations

Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought

Jianhao Huang, Zixuan Wang, Jason Lee

ICLR 2025arXiv:2502.21212
22
citations

Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization

Yu Huang, Zixin Wen, Aarti Singh et al.

NEURIPS 2025arXiv:2511.07378
5
citations

Transformers Struggle to Learn to Search

Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.

ICLR 2025arXiv:2412.04703
15
citations

Transformers without Normalization

Jiachen Zhu, Xinlei Chen, Kaiming He et al.

CVPR 2025arXiv:2503.10622
103
citations

UFM: A Simple Path towards Unified Dense Correspondence with Flow

Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.

NEURIPS 2025arXiv:2506.09278
13
citations

Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study

Xingxuan Zhang, Haoran Wang, Jiansheng Li et al.

ICLR 2025arXiv:2503.15579
5
citations

UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping

Wenbo Wang, Fangyun Wei, Lei Zhou et al.

CVPR 2025arXiv:2412.02699
16
citations

Universal Few-shot Spatial Control for Diffusion Models

Kiet Nguyen, Chanhyuk Lee, Donggyun Kim et al.

NEURIPS 2025arXiv:2509.07530

Unlabeled Data Can Provably Enhance In-Context Learning of Transformers

Renpu Liu, Jing Yang

NEURIPS 2025arXiv:2601.10058
1
citations

Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding

Qian Ma, Ruoxiang Xu, Yongqiang Cai

NEURIPS 2025arXiv:2511.06376

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis

Weronika Ormaniec, Felix Dangel, Sidak Pal Singh

ICLR 2025arXiv:2410.10986
10
citations

What Makes a Good Diffusion Planner for Decision Making?

Haofei Lu, Dongqi Han, Yifei Shen et al.

ICLR 2025arXiv:2503.00535
27
citations

Why In-Context Learning Models are Good Few-Shot Learners?

Shiguang Wu, Yaqing Wang, Quanming Yao

ICLR 2025

ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention

Qiuhao Zeng, Jierui Huang, Peng Lu et al.

ICLR 2025arXiv:2501.14577
5
citations