Poster "transformer architecture" Papers

257 papers found • Page 2 of 6

First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training

Gyudong Kim, Hyukju Na, Jin Kim et al.

NEURIPS 2025

Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

Dexiong Chen, Markus Krimmel, Karsten Borgwardt

NEURIPS 2025arXiv:2502.02216
4
citations

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Kyle Sargent, Kyle Hsu, Justin Johnson et al.

ICCV 2025arXiv:2503.11056
27
citations

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Jiale Xu, Shenghua Gao, Ying Shan

ICCV 2025arXiv:2412.09573
19
citations

From Attention to Activation: Unraveling the Enigmas of Large Language Models

Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.

ICLR 2025arXiv:2410.17174
8
citations

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NEURIPS 2025arXiv:2502.06857
10
citations

Generation as Search Operator for Test-Time Scaling of Diffusion-based Combinatorial Optimization

Yang Li, Lvda Chen, Haonan Wang et al.

NEURIPS 2025

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025arXiv:2504.02747
3
citations

Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach

Jason Piquenot, Maxime Berar, Romain Raveaux et al.

ICLR 2025

HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery

Yuto Matsubara, Ko Nishino

CVPR 2025arXiv:2412.04456
1
citations

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Neil He, Rishabh Anand, Hiren Madhu et al.

NEURIPS 2025arXiv:2505.24722
9
citations

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Saeed Amizadeh, Sara Abdali, Yinheng Li et al.

NEURIPS 2025arXiv:2509.15448

How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs

Samet Demir, Zafer Dogan

NEURIPS 2025arXiv:2510.25753

Hyperbolic Genome Embeddings

Raiyan Khan, Philippe Chlenski, Itsik Pe'er

ICLR 2025arXiv:2507.21648
3
citations

Impact of Layer Norm on Memorization and Generalization in Transformers

Rishi Singhal, Jung-Eun Kim

NEURIPS 2025arXiv:2511.10566
1
citations

Improving Formal Reasoning of Transformer with State Stack

Kechi Zhang, Ge Li, Jia Li et al.

NEURIPS 2025

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

Zhoutong Wu, Yuan Zhang, Yiming Dong et al.

NEURIPS 2025arXiv:2510.16807

Kolmogorov-Arnold Transformer

Xingyi Yang, Xinchao Wang

ICLR 2025arXiv:2409.10594
92
citations

Lambda-Skip Connections: the architectural component that prevents Rank Collapse

Federico Arangath Joseph, Jerome Sieber, Melanie Zeilinger et al.

ICLR 2025arXiv:2410.10609
2
citations

Language Models Are Implicitly Continuous

Samuele Marro, Davide Evangelista, X. Huang et al.

ICLR 2025arXiv:2504.03933
3
citations

Learning Crossmodal Interaction Patterns via Attributed Bipartite Graphs for Single-Cell Omics

Xiaotang Wang, Xuanwei Lin, Yun Zhu et al.

NEURIPS 2025

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

Shen Zhang, Siyuan Liang, Yaning Tan et al.

NEURIPS 2025arXiv:2503.04344
1
citations

Length Generalization via Auxiliary Tasks

Pranjal Awasthi, Anupam Gupta, Ravi Kumar

NEURIPS 2025

Limitations of Normalization in Attention

Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova et al.

NEURIPS 2025arXiv:2508.17821
2
citations

Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials

Yifan Pu, Jixuan Ying, Qixiu Li et al.

NEURIPS 2025arXiv:2511.00833
1
citations

LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields

Zhengqin Li, Dilin Wang, Ka chen et al.

CVPR 2025arXiv:2504.20026
7
citations

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025arXiv:2412.15188
86
citations

Longhorn: State Space Models are Amortized Online Learners

Bo Liu, Rui Wang, Lemeng Wu et al.

ICLR 2025arXiv:2407.14207
31
citations

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Haian Jin, Hanwen Jiang, Hao Tan et al.

ICLR 2025arXiv:2410.17242
96
citations

Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex

Muquan Yu, Mu Nan, Hossein Adeli et al.

NEURIPS 2025arXiv:2505.15813

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism

Zhixiong Nan, Xianghong Li, Tao Xiang et al.

CVPR 2025arXiv:2503.01463
7
citations

Mimic In-Context Learning for Multimodal Tasks

Yuchu Jiang, Jiale Fu, chenduo hao et al.

CVPR 2025arXiv:2504.08851
9
citations

MIND over Body: Adaptive Thinking using Dynamic Computation

Mrinal Mathur, Barak Pearlmutter, Sergey Plis

ICLR 2025
2
citations

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025arXiv:2502.02358
12
citations

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Aviral Chharia, Wenbo Gou, Haoye Dong

CVPR 2025arXiv:2509.00649
4
citations

Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers

Peter Súkeník, Christoph Lampert, Marco Mondelli

NEURIPS 2025arXiv:2505.15239
4
citations

NN-Former: Rethinking Graph Structure in Neural Architecture Representation

Ruihan Xu, Haokui Zhang, Yaowei Wang et al.

CVPR 2025arXiv:2507.00880
1
citations

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou, Lingrui Li, Le Lu et al.

CVPR 2025
11
citations

Normalization in Attention Dynamics

Nikita Karagodin, Shu Ge, Yury Polyanskiy et al.

NEURIPS 2025arXiv:2510.22026
3
citations

One-Minute Video Generation with Test-Time Training

Jiarui Xu, Shihao Han, Karan Dalal et al.

CVPR 2025arXiv:2504.05298
67
citations

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

Renpu Liu, Ruida Zhou, Cong Shen et al.

ICLR 2025arXiv:2410.13981
4
citations

On the Optimization and Generalization of Multi-head Attention

Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.

ICLR 2025arXiv:2310.12680
44
citations

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan, Xingjian Li, Benjamin Zhang et al.

NEURIPS 2025arXiv:2505.13499
3
citations

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NEURIPS 2025arXiv:2508.16027

OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Jinyang Li, En Yu, Sijia Chen et al.

ICLR 2025arXiv:2503.10616
8
citations

Point-MaDi: Masked Autoencoding with Diffusion for Point Cloud Pre-training

Xiaoyang Xiao, Runzhao Yao, Zhiqiang Tian et al.

NEURIPS 2025

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.

ICLR 2025arXiv:2406.17741
44
citations

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Zhijian Zhuo, Ya Wang, Yutao Zeng et al.

ICLR 2025arXiv:2411.03884
6
citations

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025arXiv:2503.17316
35
citations

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Yang Tian, Sizhe Yang, Jia Zeng et al.

ICLR 2025arXiv:2412.15109
93
citations