"transformer architecture" Papers

335 papers found • Page 3 of 7

Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex

Muquan Yu, Mu Nan, Hossein Adeli et al.

NEURIPS 2025arXiv:2505.15813

MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism

Zhixiong Nan, Xianghong Li, Tao Xiang et al.

CVPR 2025arXiv:2503.01463
7
citations

MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition

Philippe Pasquier, Jeff Ens, Nathan Fradet et al.

AAAI 2025paperarXiv:2501.17011
11
citations

Mimic In-Context Learning for Multimodal Tasks

Yuchu Jiang, Jiale Fu, chenduo hao et al.

CVPR 2025arXiv:2504.08851
9
citations

MIND over Body: Adaptive Thinking using Dynamic Computation

Mrinal Mathur, Barak Pearlmutter, Sergey Plis

ICLR 2025
2
citations

MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition

Hao Zhang, Zhan Zhuang, Xuehao Wang et al.

NEURIPS 2025oralarXiv:2505.20744
3
citations

MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm

Ziyan Guo, Zeyu HU, Na Zhao et al.

ICCV 2025arXiv:2502.02358
12
citations

Multifaceted User Modeling in Recommendation: A Federated Foundation Models Approach

Chunxu Zhang, Guodong Long, Hongkuan Guo et al.

AAAI 2025paperarXiv:2412.16969
6
citations

MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Aviral Chharia, Wenbo Gou, Haoye Dong

CVPR 2025arXiv:2509.00649
4
citations

Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers

Peter Súkeník, Christoph Lampert, Marco Mondelli

NEURIPS 2025arXiv:2505.15239
4
citations

NN-Former: Rethinking Graph Structure in Neural Architecture Representation

Ruihan Xu, Haokui Zhang, Yaowei Wang et al.

CVPR 2025arXiv:2507.00880
1
citations

nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark

Yanfeng Zhou, Lingrui Li, Le Lu et al.

CVPR 2025
11
citations

Non-Markovian Discrete Diffusion with Causal Language Models

Yangtian Zhang, Sizhuang He, Daniel Levine et al.

NEURIPS 2025oralarXiv:2502.09767
1
citations

Normalization in Attention Dynamics

Nikita Karagodin, Shu Ge, Yury Polyanskiy et al.

NEURIPS 2025arXiv:2510.22026
3
citations

Numerical Pruning for Efficient Autoregressive Models

Xuan Shen, Zhao Song, Yufa Zhou et al.

AAAI 2025paperarXiv:2412.12441
23
citations

One-Minute Video Generation with Test-Time Training

Jiarui Xu, Shihao Han, Karan Dalal et al.

CVPR 2025arXiv:2504.05298
67
citations

On the Learn-to-Optimize Capabilities of Transformers in In-Context Sparse Recovery

Renpu Liu, Ruida Zhou, Cong Shen et al.

ICLR 2025arXiv:2410.13981
4
citations

On the Optimization and Generalization of Multi-head Attention

Christos Thrampoulidis, Rouzbeh Ghaderi, Hossein Taheri et al.

ICLR 2025arXiv:2310.12680
44
citations

On the Role of Hidden States of Modern Hopfield Network in Transformer

NEURIPS 2025arXiv:2511.20698

Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency

Kelvin Kan, Xingjian Li, Benjamin Zhang et al.

NEURIPS 2025arXiv:2505.13499
3
citations

Optimal Dynamic Regret by Transformers for Non-Stationary Reinforcement Learning

Baiyuan Chen, Shinji Ito, Masaaki Imaizumi

NEURIPS 2025arXiv:2508.16027

OVTR: End-to-End Open-Vocabulary Multiple Object Tracking with Transformer

Jinyang Li, En Yu, Sijia Chen et al.

ICLR 2025arXiv:2503.10616
8
citations

Point-MaDi: Masked Autoencoding with Diffusion for Point Cloud Pre-training

Xiaoyang Xiao, Runzhao Yao, Zhiqiang Tian et al.

NEURIPS 2025

Point-SAM: Promptable 3D Segmentation Model for Point Clouds

Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.

ICLR 2025arXiv:2406.17741
44
citations

Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models

Zhijian Zhuo, Ya Wang, Yutao Zeng et al.

ICLR 2025arXiv:2411.03884
6
citations

Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors

Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.

CVPR 2025arXiv:2503.17316
35
citations

Prediction-Feedback DETR for Temporal Action Detection

Jihwan Kim, Miso Lee, Cheol-Ho Cho et al.

AAAI 2025paperarXiv:2408.16729
6
citations

Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation

Yang Tian, Sizhe Yang, Jia Zeng et al.

ICLR 2025arXiv:2412.15109
93
citations

Principles of Visual Tokens for Efficient Video Understanding

Xinyue Hao, Li, Shreyank Gowda et al.

ICCV 2025arXiv:2411.13626
1
citations

Prompt-SID: Learning Structural Representation Prompt via Latent Diffusion for Single Image Denoising

Huaqiu Li, Wang Zhang, Xiaowan Hu et al.

AAAI 2025paperarXiv:2502.06432
3
citations

PS3: A Multimodal Transformer Integrating Pathology Reports with Histology Images and Biological Pathways for Cancer Survival Prediction

Manahil Raza, Ayesha Azam, Talha Qaiser et al.

ICCV 2025arXiv:2509.20022
1
citations

Quantum Doubly Stochastic Transformers

Jannis Born, Filip Skogh, Kahn Rhrissorrakrai et al.

NEURIPS 2025spotlightarXiv:2504.16275
2
citations

RAPTR: Radar-based 3D Pose Estimation using Transformer

Sorachi Kato, Ryoma Yataka, Pu Wang et al.

NEURIPS 2025arXiv:2511.08387

RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion

Bardienus Duisterhof, Jan Oberst, Bowen Wen et al.

NEURIPS 2025arXiv:2506.05285
4
citations

RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection

Yiheng Li, Yang Yang, Zhen Lei

AAAI 2025paperarXiv:2412.12799
6
citations

Real-Time Calibration Model for Low-Cost Sensor in Fine-Grained Time Series

Seokho Ahn, Hyungjin Kim, Sungbok Shin et al.

AAAI 2025paperarXiv:2412.20170
1
citations

Reconstructing Humans with a Biomechanically Accurate Skeleton

Yan Xia, Xiaowei Zhou, Etienne Vouga et al.

CVPR 2025arXiv:2503.21751
7
citations

Rectifying Magnitude Neglect in Linear Attention

Qihang Fan, Huaibo Huang, Yuang Ai et al.

ICCV 2025highlightarXiv:2507.00698
11
citations

Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians

Akiyoshi Tomihari, Ryo Karakida

NEURIPS 2025arXiv:2505.19458
2
citations

Replacing Paths with Connection-Biased Attention for Knowledge Graph Completion

Sharmishtha Dutta, Alex Gittens, Mohammed J. Zaki et al.

AAAI 2025paperarXiv:2410.00876
5
citations

Representation Entanglement for Generation: Training Diffusion Transformers Is Much Easier Than You Think

Ge Wu, Shen Zhang, Ruijing Shi et al.

NEURIPS 2025oralarXiv:2507.01467
33
citations

Revisiting Convolution Architecture in the Realm of DNA Foundation Models

Yu Bo, Weian Mao, Daniel Shao et al.

ICLR 2025arXiv:2502.18538
4
citations

REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning

Jihyun Lee, Weipeng Xu, Alexander Richard et al.

CVPR 2025arXiv:2504.04956
9
citations

RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning

Kunming Su, Qiuxia Wu, Panpan Cai et al.

AAAI 2025paperarXiv:2409.00353
14
citations

RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

ICLR 2025arXiv:2402.18510
51
citations

Rope to Nope and Back Again: A New Hybrid Attention Strategy

Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.

NEURIPS 2025arXiv:2501.18795
20
citations

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu et al.

ICLR 2025arXiv:2408.00714
2393
citations

SAS: Simulated Attention Score

Chuanyang Zheng, Jiankai Sun, Yihang Gao et al.

NEURIPS 2025arXiv:2507.07694
2
citations

Scaling Transformers for Low-Bitrate High-Quality Speech Coding

Julian Parker, Anton Smirnov, Jordi Pons et al.

ICLR 2025arXiv:2411.19842
62
citations

SCENT: Robust Spatiotemporal Learning for Continuous Scientific Data via Scalable Conditioned Neural Fields

David K Park, Xihaier Luo, Guang Zhao et al.

ICML 2025oralarXiv:2504.12262
1
citations