"transformer architecture" Papers

335 papers found • Page 2 of 7

EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation

Zihao Zhang, Haoran Chen, Haoyu Zhao et al.

CVPR 2025arXiv:2503.15831
10
citations

Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution

Karam Park, Jae Woong Soh, Nam Ik Cho

AAAI 2025paperarXiv:2501.15774
10
citations

Efficient Concertormer for Image Deblurring and Beyond

Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.

ICCV 2025arXiv:2404.06135

Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

Leon Götz, Marcel Kollovieh, Stephan Günnemann et al.

ICML 2025arXiv:2405.17951
5
citations

Emergence of Linear Truth Encodings in Language Models

Shauli Ravfogel, Gilad Yehudai, Tal Linzen et al.

NEURIPS 2025arXiv:2510.15804
3
citations

End-to-End HOI Reconstruction Transformer with Graph-based Encoding

Zhenrong Wang, Qi Zheng, Sihan Ma et al.

CVPR 2025highlightarXiv:2503.06012
1
citations

End-to-End Implicit Neural Representations for Classification

Alexander Gielisse, Jan van Gemert

CVPR 2025arXiv:2503.18123
4
citations

Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels

Pierre Vuillecard, Jean-marc Odobez

CVPR 2025arXiv:2502.20249
7
citations

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

JiaKui Hu, Zhengjian Yao, Lujia Jin et al.

ICCV 2025arXiv:2506.18520
3
citations

Enhancing Transformers Through Conditioned Embedded Tokens

Hemanth Saratchandran, Simon Lucey

ICCV 2025arXiv:2505.12789
2
citations

ESCAPE: Equivariant Shape Completion via Anchor Point Encoding

Burak Bekci, Nassir Navab, Federico Tombari et al.

CVPR 2025arXiv:2412.00952
4
citations

Evolutionary Reasoning Does Not Arise in Standard Usage of Protein Language Models

Yasha Ektefaie, Andrew Shen, Lavik Jain et al.

NEURIPS 2025

Exact Expressive Power of Transformers with Padding

Will Merrill, Ashish Sabharwal

NEURIPS 2025arXiv:2505.18948
7
citations

FaceXFormer: A Unified Transformer for Facial Analysis

Kartik Narayan, Vibashan VS, Rama Chellappa et al.

ICCV 2025arXiv:2403.12960
37
citations

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.

NEURIPS 2025spotlightarXiv:2503.18908
2
citations

First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training

Gyudong Kim, Hyukju Na, Jin Kim et al.

NEURIPS 2025

Flatten Graphs as Sequences: Transformers are Scalable Graph Generators

Dexiong Chen, Markus Krimmel, Karsten Borgwardt

NEURIPS 2025arXiv:2502.02216
4
citations

Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization

Kyle Sargent, Kyle Hsu, Justin Johnson et al.

ICCV 2025arXiv:2503.11056
27
citations

FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction

Jiale Xu, Shenghua Gao, Ying Shan

ICCV 2025arXiv:2412.09573
19
citations

From Attention to Activation: Unraveling the Enigmas of Large Language Models

Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.

ICLR 2025arXiv:2410.17174
8
citations

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

Sean McLeish, John Kirchenbauer, David Miller et al.

NEURIPS 2025arXiv:2502.06857
10
citations

Generation as Search Operator for Test-Time Scaling of Diffusion-based Combinatorial Optimization

Yang Li, Lvda Chen, Haonan Wang et al.

NEURIPS 2025

GeoAggregator: An Efficient Transformer Model for Geo-Spatial Tabular Data

Rui Deng, Ziqi Li, Mingshu Wang

AAAI 2025paperarXiv:2502.15032
1
citations

GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes

Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.

ICCV 2025arXiv:2504.02747
3
citations

Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach

Jason Piquenot, Maxime Berar, Romain Raveaux et al.

ICLR 2025

HeatFormer: A Neural Optimizer for Multiview Human Mesh Recovery

Yuto Matsubara, Ko Nishino

CVPR 2025arXiv:2412.04456
1
citations

HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts

Neil He, Rishabh Anand, Hiren Madhu et al.

NEURIPS 2025arXiv:2505.24722
9
citations

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Saeed Amizadeh, Sara Abdali, Yinheng Li et al.

NEURIPS 2025arXiv:2509.15448

How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs

Samet Demir, Zafer Dogan

NEURIPS 2025arXiv:2510.25753

Hyperbolic Genome Embeddings

Raiyan Khan, Philippe Chlenski, Itsik Pe'er

ICLR 2025arXiv:2507.21648
3
citations

Impact of Layer Norm on Memorization and Generalization in Transformers

Rishi Singhal, Jung-Eun Kim

NEURIPS 2025arXiv:2511.10566
1
citations

Improving Formal Reasoning of Transformer with State Stack

Kechi Zhang, Ge Li, Jia Li et al.

NEURIPS 2025

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

Zhoutong Wu, Yuan Zhang, Yiming Dong et al.

NEURIPS 2025arXiv:2510.16807

Improving Transformer Based Line Segment Detection with Matched Predicting and Re-ranking

Xin Tong, Shi Peng, Baojie Tian et al.

AAAI 2025paperarXiv:2502.17766
2
citations

InterMask: 3D Human Interaction Generation via Collaborative Masked Modeling

Muhammad Gohar Javed, chuan guo, Li Cheng et al.

ICLR 2025oralarXiv:2410.10010
27
citations

JoVALE: Detecting Human Actions in Video Using Audiovisual and Language Contexts

Taein Son, Soo Won Seo, Jisong Kim et al.

AAAI 2025paperarXiv:2412.13708
2
citations

Kolmogorov-Arnold Transformer

Xingyi Yang, Xinchao Wang

ICLR 2025arXiv:2409.10594
92
citations

Lambda-Skip Connections: the architectural component that prevents Rank Collapse

Federico Arangath Joseph, Jerome Sieber, Melanie Zeilinger et al.

ICLR 2025arXiv:2410.10609
2
citations

Language Models Are Implicitly Continuous

Samuele Marro, Davide Evangelista, X. Huang et al.

ICLR 2025arXiv:2504.03933
3
citations

Learning Crossmodal Interaction Patterns via Attributed Bipartite Graphs for Single-Cell Omics

Xiaotang Wang, Xuanwei Lin, Yun Zhu et al.

NEURIPS 2025

LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding

Shen Zhang, Siyuan Liang, Yaning Tan et al.

NEURIPS 2025arXiv:2503.04344
1
citations

Length Generalization via Auxiliary Tasks

Pranjal Awasthi, Anupam Gupta, Ravi Kumar

NEURIPS 2025

LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph

Tu Ao, Yanhua Yu, Yuling Wang et al.

AAAI 2025paperarXiv:2504.03137
15
citations

Limitations of Normalization in Attention

Timur Mudarisov, Mikhail Burtsev, Tatiana Petrova et al.

NEURIPS 2025arXiv:2508.17821
2
citations

Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials

Yifan Pu, Jixuan Ying, Qixiu Li et al.

NEURIPS 2025arXiv:2511.00833
1
citations

LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields

Zhengqin Li, Dilin Wang, Ka chen et al.

CVPR 2025arXiv:2504.20026
7
citations

LMFusion: Adapting Pretrained Language Models for Multimodal Generation

Weijia Shi, Xiaochuang Han, Chunting Zhou et al.

NEURIPS 2025arXiv:2412.15188
86
citations

Longhorn: State Space Models are Amortized Online Learners

Bo Liu, Rui Wang, Lemeng Wu et al.

ICLR 2025arXiv:2407.14207
31
citations

LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias

Haian Jin, Hanwen Jiang, Hao Tan et al.

ICLR 2025arXiv:2410.17242
96
citations

Memory Efficient Matting with Adaptive Token Routing

Yiheng Lin, Yihan Hu, Chenyi Zhang et al.

AAAI 2025paperarXiv:2412.10702