"transformer architecture" Papers
335 papers found • Page 7 of 7
Conference
SeTformer Is What You Need for Vision and Language
Pourya Shamsolmoali, Masoumeh Zareapoor, Eric Granger et al.
Slot Abstractors: Toward Scalable Abstract Visual Reasoning
Shanka Subhra Mondal, Jonathan Cohen, Taylor Webb
SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
mingjun zheng, Long Sun, Jiangxin Dong et al.
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
Heeseung Yun, Ruohan Gao, Ishwarya Ananthabhotla et al.
SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
kang you, Zekai Xu, Chen Nie et al.
Surface-VQMAE: Vector-quantized Masked Auto-encoders on Molecular Surfaces
Fang Wu, Stan Z Li
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
Byeongjun Park, Hyojun Go, Jin-Young Kim et al.
Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar, Yongqin Xian, Alessio Tonioni et al.
The Illusion of State in State-Space Models
William Merrill, Jackson Petty, Ashish Sabharwal
The Pitfalls of Next-Token Prediction
Gregor Bachmann, Vaishnavh Nagarajan
Towards Causal Foundation Model: on Duality between Optimal Balancing and Attention
Jiaqi Zhang, Joel Jennings, Agrin Hilmkil et al.
Towards Efficient Spiking Transformer: a Token Sparsification Framework for Training and Inference Acceleration
Zhengyang Zhuge, Peisong Wang, Xingting Yao et al.
Towards General Algorithm Discovery for Combinatorial Optimization: Learning Symbolic Branching Policy from Bipartite Graph
Yufei Kuang, Jie Wang, Yuyan Zhou et al.
Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network
wenqiao Li, Xiaohao Xu, Yao Gu et al.
Towards Understanding Inductive Bias in Transformers: A View From Infinity
Itay Lavie, Guy Gur-Ari, Zohar Ringel
Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random Features
Simone Bombari, Marco Mondelli
Trainable Transformer in Transformer
Abhishek Panigrahi, Sadhika Malladi, Mengzhou Xia et al.
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
Leheng Zhang, Yawei Li, Xingyu Zhou et al.
Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning
Jinsong Shi, Pan Gao, Jie Qin
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao, Albert Gu
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim, Taiji Suzuki
Translation Equivariant Transformer Neural Processes
Matthew Ashman, Cristiana Diaconu, Junhyuck Kim et al.
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Haixu Wu, Huakun Luo, Haowen Wang et al.
Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers
Zi-Xin Zou, Zhipeng Yu, Yuan-Chen Guo et al.
Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement
Kun Zhou, Xinyu Lin, Wenbo Li et al.
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention
Zhen Qin, Weigao Sun, Dong Li et al.
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network
Quan Zhang, Lei Wang, Vishal M. Patel et al.
Viewing Transformers Through the Lens of Long Convolutions Layers
Itamar Zimerman, Lior Wolf
VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning
Tangfei Liao, Xiaoqin Zhang, Li Zhao et al.
Wavelength-Embedding-guided Filter-Array Transformer for Spectral Demosaicing
haijin zeng, Hiep Luong, Wilfried Philips
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Xingwu Chen, Difan Zou
When Fast Fourier Transform Meets Transformer for Image Restoration
xingyu jiang, Xiuhui Zhang, Ning Gao et al.
When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
Haoran You, Yichao Fu, Zheng Wang et al.
X4D-SceneFormer: Enhanced Scene Understanding on 4D Point Cloud Videos through Cross-Modal Knowledge Transfer
Linglin Jing, Ying Xue, Xu Yan et al.