Poster "transformer architecture" Papers
257 papers found • Page 3 of 6
Conference
Principles of Visual Tokens for Efficient Video Understanding
Xinyue Hao, Li, Shreyank Gowda et al.
PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction
Manahil Raza, Ayesha Azam, Talha Qaiser et al.
RAPTR: Radar-based 3D Pose Estimation using Transformer
Sorachi Kato, Ryoma Yataka, Pu Wang et al.
RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion
Bardienus Duisterhof, Jan Oberst, Bowen Wen et al.
Reconstructing Humans with a Biomechanically Accurate Skeleton
Yan Xia, Xiaowei Zhou, Etienne Vouga et al.
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
Akiyoshi Tomihari, Ryo Karakida
Revisiting Convolution Architecture in the Realm of DNA Foundation Models
Yu Bo, Weian Mao, Daniel Shao et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jihyun Lee, Weipeng Xu, Alexander Richard et al.
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval
Kaiyue Wen, Xingyu Dang, Kaifeng Lyu
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu et al.
SAS: Simulated Attention Score
Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker, Anton Smirnov, Jordi Pons et al.
S-Crescendo: A Nested Transformer Weaving Framework for Scalable Nonlinear System in S-Domain Representation
Junlang Huang, Chen Hao, Li Luo et al.
Selective Attention Improves Transformer
Yaniv Leviathan, Matan Kalman, Yossi Matias
SimulMEGA: MoE Routers are Advanced Policy Makers for Simultaneous Speech Translation
Chenyang Le, Bing Han, Jinshun Li et al.
Spark Transformer: Reactivating Sparsity in Transformer FFN and Attention
Chong You, Kan Wu, Zhipeng Jia et al.
SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
Yaniv Benny, Lior Wolf
Spiking Neural Networks Need High-Frequency Information
Yuetong Fang, Deming Zhou, Ziqing Wang et al.
SpiLiFormer: Enhancing Spiking Transformers with Lateral Inhibition
Zeqi Zheng, Yanchen Huang, Yingchao Yu et al.
SpinQuant: LLM Quantization with Learned Rotations
Zechun Liu, Changsheng Zhao, Igor Fedorov et al.
StarTrail: Concentric Ring Sequence Parallelism for Efficient Near-Infinite-Context Transformer Model Training
Ziming Liu, Shaoyu Wang, Shenggan Cheng et al.
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag, Xianghao Kong, Jingtao Li et al.
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
Systematic Outliers in Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
TAB: Transformer Attention Bottlenecks enable User Intervention and Debugging in Vision-Language Models
Pooyan Rahmanzadehgervi, Hung Nguyen, Rosanne Liu et al.
TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE
Jiawen Wei, jiang lan, Pengbo Wei et al.
Task Descriptors Help Transformers Learn Linear Models In-Context
Ruomin Huang, Rong Ge
Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context
Taejong Joo, Diego Klabjan
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.
Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding
Ebrahim Feghhi, Shreyas Kaasyap, Nima Hadidi et al.
Token Statistics Transformer: Linear-Time Attention via Variational Rate Reduction
Ziyang Wu, Tianjiao Ding, Yifu Lu et al.
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao, Chao-Han Huck Yang, Renhe Jiang et al.
Transformer Learns Optimal Variable Selection in Group-Sparse Classification
Chenyang Zhang, Xuran Meng, Yuan Cao
Transformers are almost optimal metalearners for linear classification
Roey Magen, Gal Vardi
Transformers Handle Endogeneity in In-Context Linear Regression
Haodong Liang, Krishna Balasubramanian, Lifeng Lai
Transformers Learn to Implement Multi-step Gradient Descent with Chain of Thought
Jianhao Huang, Zixuan Wang, Jason Lee
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
Yu Huang, Zixin Wen, Aarti Singh et al.
Transformers Struggle to Learn to Search
Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.
Transformers without Normalization
Jiachen Zhu, Xinlei Chen, Kaiming He et al.
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.
Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
Xingxuan Zhang, Haoran Wang, Jiansheng Li et al.
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
Universal Few-shot Spatial Control for Diffusion Models
Kiet Nguyen, Chanhyuk Lee, Donggyun Kim et al.
Unlabeled Data Can Provably Enhance In-Context Learning of Transformers
Renpu Liu, Jing Yang
Vocabulary In-Context Learning in Transformers: Benefits of Positional Encoding
Qian Ma, Ruoxiang Xu, Yongqiang Cai
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
What Makes a Good Diffusion Planner for Decision Making?
Haofei Lu, Dongqi Han, Yifei Shen et al.
Why In-Context Learning Models are Good Few-Shot Learners?
Shiguang Wu, Yaqing Wang, Quanming Yao
ZETA: Leveraging $Z$-order Curves for Efficient Top-$k$ Attention
Qiuhao Zeng, Jierui Huang, Peng Lu et al.