Poster "transformer architecture" Papers

257 papers found • Page 2 of 6

Filters:poster transformer architecture Clear all

Conference

AAAI 2025 (3,028)COLM 2025 (418)CVPR 2025 (2,873)ICCV 2025 (2,701)ICLR 2025 (3,827)ICML 2025 (3,340)ISMAR 2025 (229)NEURIPS 2025 (5,858)AAAI 2024 (2,289)CVPR 2024 (2,716)ECCV 2024 (2,387)ICLR 2024 (2,297)ICML 2024 (2,635)

Paper Type

poster (24,624)paper (8,558)oral (1,594)spotlight (1,421)highlight (975)

First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training

Gyudong Kim, Hyukju Na, Jin Kim et al.

NEURIPS 2025

Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

Dexiong Chen, Markus Krimmel, Karsten Borgwardt

NEURIPS 2025arXiv:2502.02216

citations

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Kyle Sargent, Kyle Hsu, Justin Johnson et al.

ICCV 2025arXiv:2503.11056

citations

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Jiale Xu, Shenghua Gao, Ying Shan

ICCV 2025arXiv:2412.09573

citations

From Attention to Activation: Unraveling the Enigmas of Large Language Models

Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.

ICLR 2025arXiv:2410.17174

citations

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NEURIPS 2025arXiv:2502.06857

citations

Generation as Search Operator for Test-Time Scaling of Diffusion-based Combinatorial Optimization

Yang Li, Lvda Chen, Haonan Wang et al.

NEURIPS 2025

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025arXiv:2504.02747

citations

Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach

Jason Piquenot, Maxime Berar, Romain Raveaux et al.

ICLR 2025

HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery

Yuto Matsubara, Ko Nishino

CVPR 2025arXiv:2412.04456

citations

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Neil He, Rishabh Anand, Hiren Madhu et al.

NEURIPS 2025arXiv:2505.24722

citations

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Saeed Amizadeh, Sara Abdali, Yinheng Li et al.

NEURIPS 2025arXiv:2509.15448

How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs

Samet Demir, Zafer Dogan

NEURIPS 2025arXiv:2510.25753

Hyperbolic Genome Embeddings

Raiyan Khan, Philippe Chlenski, Itsik Pe'er

ICLR 2025arXiv:2507.21648

citations

Impact of Layer Norm on Memorization and Generalization in Transformers

Rishi Singhal, Jung-Eun Kim

NEURIPS 2025arXiv:2511.10566

citations

Improving Formal Reasoning of Transformer with State Stack

Kechi Zhang, Ge Li, Jia Li et al.

NEURIPS 2025

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

Zhoutong Wu, Yuan Zhang, Yiming Dong et al.

NEURIPS 2025arXiv:2510.16807

Kolmogorov-Arnold Transformer

Xingyi Yang, Xinchao Wang

ICLR 2025arXiv:2409.10594

citations

Lambda-Skip Connections: the architectural component that prevents Rank Collapse

Federico Arangath Joseph, Jerome Sieber, Melanie Zeilinger et al.

ICLR 2025arXiv:2410.10609

citations

Language Models Are Implicitly Continuous

Samuele Marro, Davide Evangelista, X. Huang et al.

ICLR 2025arXiv:2504.03933

citations

Learning Crossmodal Interaction Patterns via Attributed Bipartite Graphs for Single-Cell Omics

Xiaotang Wang, Xuanwei Lin, Yun Zhu et al.

NEURIPS 2025

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

Shen Zhang, Siyuan Liang, Yaning Tan et al.

NEURIPS 2025arXiv:2503.04344

citations

Length Generalization via Auxiliary Tasks

Pranjal Awasthi, Anupam Gupta, Ravi Kumar

NEURIPS 2025

Limitations of Normalization in Attention

Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova et al.

NEURIPS 2025arXiv:2508.17821

citations

Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials

Yifan Pu, Jixuan Ying, Qixiu Li et al.

NEURIPS 2025arXiv:2511.00833

citations

LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields

Zhengqin Li, Dilin Wang, Ka chen et al.

CVPR 2025arXiv:2504.20026

citations

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025arXiv:2412.15188

citations

Longhorn: State Space Models are Amortized Online Learners

Bo Liu, Rui Wang, Lemeng Wu et al.

ICLR 2025arXiv:2407.14207

citations

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Haian Jin, Hanwen Jiang, Hao Tan et al.

ICLR 2025arXiv:2410.17242

citations

Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex

Muquan Yu, Mu Nan, Hossein Adeli et al.

NEURIPS 2025arXiv:2505.15813

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism

Zhixiong Nan, Xianghong Li, Tao Xiang et al.

CVPR 2025arXiv:2503.01463

citations

Mimic In-Context Learning for Multimodal Tasks

Yuchu Jiang, Jiale Fu, chenduo hao et al.

CVPR 2025arXiv:2504.08851

citations

MIND over Body: Adaptive Thinking using Dynamic Computation

Mrinal Mathur, Barak Pearlmutter, Sergey Plis

ICLR 2025

citations

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025arXiv:2502.02358

citations

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Aviral Chharia, Wenbo Gou, Haoye Dong

CVPR 2025arXiv:2509.00649

citations

Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers

Peter Súkeník, Christoph Lampert, Marco Mondelli

NEURIPS 2025arXiv:2505.15239

citations

NN-Former: Rethinking Graph Structure in Neural Architecture Representation

Ruihan Xu, Haokui Zhang, Yaowei Wang et al.

CVPR 2025arXiv:2507.00880

citations

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou, Lingrui Li, Le Lu et al.

CVPR 2025

citations

Normalization in Attention Dynamics

Nikita Karagodin, Shu Ge, Yury Polyanskiy et al.

NEURIPS 2025arXiv:2510.22026

citations

One-Minute Video Generation with Test-Time Training

Jiarui Xu, Shihao Han, Karan Dalal et al.

CVPR 2025arXiv:2504.05298

citations

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

Renpu Liu, Ruida Zhou, Cong Shen et al.

ICLR 2025arXiv:2410.13981

citations

On the Optimization and Generalization of Multi-head Attention

Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.

ICLR 2025arXiv:2310.12680

citations

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan, Xingjian Li, Benjamin Zhang et al.

NEURIPS 2025arXiv:2505.13499

citations

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NEURIPS 2025arXiv:2508.16027

OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Jinyang Li, En Yu, Sijia Chen et al.

ICLR 2025arXiv:2503.10616

citations

Point-MaDi: Masked Autoencoding with Diffusion for Point Cloud Pre-training

Xiaoyang Xiao, Runzhao Yao, Zhiqiang Tian et al.

NEURIPS 2025

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.

ICLR 2025arXiv:2406.17741

citations

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Zhijian Zhuo, Ya Wang, Yutao Zeng et al.

ICLR 2025arXiv:2411.03884

citations

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025arXiv:2503.17316

citations

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Yang Tian, Sizhe Yang, Jia Zeng et al.

ICLR 2025arXiv:2412.15109

citations

← Previous

1 2 3 4...6