"attention mechanism" Papers

390 papers found • Page 2 of 8

Dependency Parsing is More Parameter-Efficient with Normalization

Paolo Gajo, Domenic Rosati, Hassan Sajjad et al.

NEURIPS 2025arXiv:2505.20215

Devil is in the Detail: Towards Injecting Fine Details of Image Prompt in Image Generation via Conflict-free Guidance and Stratified Attention

Kyungmin Jo, Jooyeol Yun, Jaegul Choo

CVPR 2025arXiv:2508.02004
2
citations

Devil is in the Uniformity: Exploring Diverse Learners within Transformer for Image Restoration

Shihao Zhou, Dayu Li, Jinshan Pan et al.

ICCV 2025arXiv:2503.20174
1
citations

DiCo: Revitalizing ConvNets for Scalable and Efficient Diffusion Modeling

Yuang Ai, Qihang Fan, Xuefeng Hu et al.

NEURIPS 2025spotlightarXiv:2505.11196
1
citations

Differential Transformer

Tianzhu Ye, Li Dong, Yuqing Xia et al.

ICLR 2025arXiv:2410.05258

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

George Wang, Jesse Hoogland, Stan van Wingerden et al.

ICLR 2025arXiv:2410.02984
24
citations

DiffSim: Taming Diffusion Models for Evaluating Visual Similarity

Yiren Song, Xiaokang Liu, Mike Zheng Shou

ICCV 2025arXiv:2412.14580
9
citations

DIFFSSR: Stereo Image Super-resolution Using Differential Transformer

Dafeng Zhang

NEURIPS 2025

Diffusion-Based Imaginative Coordination for Bimanual Manipulation

Huilin Xu, Jian Ding, Jiakun Xu et al.

ICCV 2025arXiv:2507.11296
2
citations

Diffusion Transformer Captures Spatial-Temporal Dependencies: A Theory for Gaussian Process Data

Hengyu Fu, Zehao Dou, Jiawei Guo et al.

ICLR 2025oralarXiv:2407.16134
3
citations

Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection

Jia Guo, Shuai Lu, Weihang Zhang et al.

CVPR 2025arXiv:2405.14325
56
citations

Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation

Chanyoung Kim, Dayun Ju, Woojung Han et al.

CVPR 2025arXiv:2411.17150
10
citations

Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers

Lei Chen, Joan Bruna, Alberto Bietti

ICLR 2025arXiv:2406.03068
8
citations

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

Minghong Cai, Xiaodong Cun, Xiaoyu Li et al.

CVPR 2025arXiv:2412.18597
46
citations

DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models

Saeed Ranjbar Alvar, Gursimran Singh, Mohammad Akbari et al.

CVPR 2025arXiv:2503.02175
57
citations

DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation

Hongbin Lin, Zilu Guo, Yifan Zhang et al.

CVPR 2025arXiv:2503.11122
12
citations

DriveTransformer: Unified Transformer for Scalable End-to-End Autonomous Driving

Xiaosong Jia, Junqi You, Zhiyuan Zhang et al.

ICLR 2025oralarXiv:2503.07656
70
citations

Dual Conditioned Motion Diffusion for Pose-Based Video Anomaly Detection

Hongsong Wang, Andi Xu, Pinle Ding et al.

AAAI 2025paperarXiv:2412.17210
6
citations

DuetGraph: Coarse-to-Fine Knowledge Graph Reasoning with Dual-Pathway Global-Local Fusion

Jin Li, Zezhong Ding, Xike Xie

NEURIPS 2025arXiv:2507.11229
1
citations

DuSA: Fast and Accurate Dual-Stage Sparse Attention Mechanism Accelerating Both Training and Inference

Chong Wu, Jiawang Cao, Renjie Xu et al.

NEURIPS 2025

DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

Xirui Hu, Jiahao Wang, Hao chen et al.

ICCV 2025arXiv:2503.06505
8
citations

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training

Xingyu Chen, Yue Chen, Yuliang Xiu et al.

ICCV 2025arXiv:2503.24391
48
citations

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Rang Meng, Xingyu Zhang, Yuming Li et al.

CVPR 2025arXiv:2411.10061
55
citations

EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation

Daikun Liu, Lei Cheng, Teng Wang et al.

CVPR 2025arXiv:2506.03512
3
citations

EdgeTAM: On-Device Track Anything Model

Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.

CVPR 2025arXiv:2501.07256
9
citations

Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution

Karam Park, Jae Woong Soh, Nam Ik Cho

AAAI 2025paperarXiv:2501.15774
10
citations

Enhancing Document Understanding with Group Position Embedding: A Novel Approach to Incorporate Layout Information

Yuke Zhu, Yue Zhang, Dongdong Liu et al.

ICLR 2025
2
citations

Enhancing Image Restoration Transformer via Adaptive Translation Equivariance

JiaKui Hu, Zhengjian Yao, Lujia Jin et al.

ICCV 2025arXiv:2506.18520
3
citations

Enhancing Masked Time-Series Modeling via Dropping Patches

Tianyu Qiu, Yi Xie, Hao Niu et al.

AAAI 2025paperarXiv:2412.15315
4
citations

Enhancing Multimodal Large Language Models Complex Reason via Similarity Computation

Xiaofeng Zhang, Fanshuo Zeng, Yihao Quan et al.

AAAI 2025paperarXiv:2412.09817

Enhancing Training Data Attribution with Representational Optimization

Weiwei Sun, Haokun Liu, Nikhil Kandpal et al.

NEURIPS 2025spotlightarXiv:2505.18513

Enhancing Transformers Through Conditioned Embedded Tokens

Hemanth Saratchandran, Simon Lucey

ICCV 2025arXiv:2505.12789
2
citations

Entropy Rectifying Guidance for Diffusion and Flow Models

Tariq Berrada Ifriqi, Adriana Romero-Soriano, Michal Drozdzal et al.

NEURIPS 2025arXiv:2504.13987
3
citations

Everything is Editable: Extend Knowledge Editing to Unstructured Data in Large Language Models

Jingcheng Deng, Zihao Wei, Liang Pang et al.

ICLR 2025arXiv:2405.15349
8
citations

Exact Expressive Power of Transformers with Padding

Will Merrill, Ashish Sabharwal

NEURIPS 2025arXiv:2505.18948
7
citations

Exploring Diffusion Transformer Designs via Grafting

Keshigeyan Chandrasegaran, Michael Poli, Dan Fu et al.

NEURIPS 2025oralarXiv:2506.05340
5
citations

FFN Fusion: Rethinking Sequential Computation in Large Language Models

Akhiad Bercovich, Mohammed Dabbah, Omri Puny et al.

NEURIPS 2025spotlightarXiv:2503.18908
2
citations

First Attentions Last: Better Exploiting First Attentions for Efficient Parallel Training

Gyudong Kim, Hyukju Na, Jin Kim et al.

NEURIPS 2025

FLAME: Fast Long-context Adaptive Memory for Event-based Vision

Biswadeep Chakraborty, Saibal Mukhopadhyay

NEURIPS 2025oral

FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Xunhao Lai, Jianqiao Lu, Yao Luo et al.

ICLR 2025arXiv:2502.20766
62
citations

F-LMM: Grounding Frozen Large Multimodal Models

Size Wu, Sheng Jin, Wenwei Zhang et al.

CVPR 2025arXiv:2406.05821
22
citations

FlowPrune: Accelerating Attention Flow Calculation by Pruning Flow Network

Shuo Xu, Yu Chen, Shuxia Lin et al.

NEURIPS 2025

From Attention to Activation: Unraveling the Enigmas of Large Language Models

Prannay Kaul, Chengcheng Ma, Ismail Elezi et al.

ICLR 2025arXiv:2410.17174
8
citations

From Softmax to Score: Transformers Can Effectively Implement In-Context Denoising Steps

Paul Rosu, Lawrence Carin, Xiang Cheng

NEURIPS 2025

Fully-inductive Node Classification on Arbitrary Graphs

Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin et al.

ICLR 2025arXiv:2405.20445
14
citations

Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency

Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.

ICLR 2025arXiv:2411.16525
18
citations

Generalization Guarantees for Representation Learning via Data-Dependent Gaussian Mixture Priors

Milad Sefidgaran, Abdellatif Zaidi, Piotr Krasnowski

ICLR 2025arXiv:2502.15540
3
citations

Generating, Fast and Slow: Scalable Parallel Video Generation with Video Interface Networks

Bhishma Dedhia, David Bourgin, Krishna Kumar Singh et al.

ICCV 2025arXiv:2503.17539
1
citations

GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation

Jiawei Lu, YingPeng Zhang, Zengjun Zhao et al.

AAAI 2025paperarXiv:2409.18401
7
citations

Glance2Gaze: Efficient Vision-Language Models from Glance Fusion to Gaze Compression

Juan Chen, Honglin liu, Yingying Ao et al.

NEURIPS 2025