"vision transformers" Papers

122 papers found • Page 2 of 3

Scalable Neural Network Geometric Robustness Validation via Hölder Optimisation

Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio

NEURIPS 2025

Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers

Yunshan Zhong, Yuyao Zhou, Yuxin Zhang et al.

ICCV 2025arXiv:2412.16553

Sinusoidal Initialization, Time for a New Start

Alberto Fernandez-Hernandez, Jose Mestre, Manuel F. Dolz et al.

NEURIPS 2025arXiv:2505.12909
1
citations

SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound

Yunke Ao, Masoud Moghani, Mayank Mittal et al.

NEURIPS 2025arXiv:2507.01152
1
citations

Spectral State Space Model for Rotation-Invariant Visual Representation Learning

Sahar Dastani, Ali Bahri, Moslem Yazdanpanah et al.

CVPR 2025arXiv:2503.06369
4
citations

Spiking Vision Transformer with Saccadic Attention

Shuai Wang, Malu Zhang, Dehao Zhang et al.

ICLR 2025oralarXiv:2502.12677
17
citations

Split Adaptation for Pre-trained Vision Transformers

Lixu Wang, Bingqi Shang, Yi Li et al.

CVPR 2025arXiv:2503.00441
2
citations

Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling

Cristian Rodriguez-Opazo, Ehsan Abbasnejad, Damien Teney et al.

ICLR 2025arXiv:2405.17139
1
citations

TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection

Qiang Qi, Xiao Wang

AAAI 2025paperarXiv:2503.13903
5
citations

Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better

Zihang Lai, Andrea Vedaldi

CVPR 2025highlightarXiv:2503.19904
4
citations

TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses

Sahar Dastani, Ali Bahri, Gustavo Vargas Hakim et al.

NEURIPS 2025arXiv:2509.22813

Variance-Based Pruning for Accelerating and Compressing Trained Networks

Uranik Berisha, Jens Mehnert, Alexandru Condurache

ICCV 2025arXiv:2507.12988
1
citations

Vision Transformers Don't Need Trained Registers

Nicholas Jiang, Amil Dravid, Alexei Efros et al.

NEURIPS 2025spotlightarXiv:2506.08010
15
citations

Vision Transformers with Self-Distilled Registers

Zipeng Yan, Yinjie Chen, Chong Zhou et al.

NEURIPS 2025spotlightarXiv:2505.21501
4
citations

ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers

Hanwen Cao, Haobo Lu, Xiaosen Wang et al.

ICCV 2025arXiv:2508.12384
1
citations

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

Guoyizhe Wei, Rama Chellappa

ICCV 2025arXiv:2504.00037
3
citations

VSSD: Vision Mamba with Non-Causal State Space Duality

Yuheng Shi, Mingjia Li, Minjing Dong et al.

ICCV 2025arXiv:2407.18559
30
citations

Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation

Jialai Wang, Yuxiao Wu, Weiye Xu et al.

CVPR 2025
3
citations

Your ViT is Secretly an Image Segmentation Model

Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.

CVPR 2025highlightarXiv:2503.19108
26
citations

Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

Dongyoon Hwang, Byungkun Lee, Hojoon Lee et al.

ICML 2024arXiv:2406.06072

Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han, Tianzhu Ye, Yizeng Han et al.

ECCV 2024arXiv:2312.08874
212
citations

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.

ICML 2024arXiv:2402.05602
92
citations

AttnZero: Efficient Attention Discovery for Vision Transformers

Lujun Li, Zimian Wei, Peijie Dong et al.

ECCV 2024
14
citations

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

Kaishen Yuan, Zitong Yu, Xin Liu et al.

ECCV 2024arXiv:2403.04697
34
citations

A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis

Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee

AAAI 2024paperarXiv:2308.07301
9
citations

Characterizing Model Robustness via Natural Input Gradients

Adrian Rodriguez-Munoz, Tongzhou Wang, Antonio Torralba

ECCV 2024arXiv:2409.20139
2
citations

Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget

Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.

AAAI 2024paperarXiv:2304.10520
22
citations

Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

Itamar Zimerman, Moran Baruch, Nir Drucker et al.

ICML 2024arXiv:2311.08610
22
citations

Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

Mikkel Jordahn, Pablo Olmos

ICML 2024arXiv:2405.01196
4
citations

Denoising Vision Transformers

Jiawei Yang, Katie Luo, Jiefeng Li et al.

ECCV 2024arXiv:2401.02957
31
citations

DiffiT: Diffusion Vision Transformers for Image Generation

Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.

ECCV 2024arXiv:2312.02139
122
citations

Efficient Multitask Dense Predictor via Binarization

Yuzhang Shang, Dan Xu, Gaowen Liu et al.

CVPR 2024arXiv:2405.14136
6
citations

ERQ: Error Reduction for Post-Training Quantization of Vision Transformers

Yunshan Zhong, Jiawei Hu, You Huang et al.

ICML 2024spotlight

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

Qing Yu, Mikihiro Tanaka, Kent Fujiwara

CVPR 2024arXiv:2405.04771
16
citations

Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Aaron Havens, Alexandre Araujo, Huan Zhang et al.

ICML 2024

GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features

Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.

ECCV 2024arXiv:2407.12427
28
citations

Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning

Pengyu Li, Biao Wang, Tianchu Guo et al.

ECCV 2024

Improving Interpretation Faithfulness for Vision Transformers

Lijie Hu, Yixin Liu, Ninghao Liu et al.

ICML 2024spotlightarXiv:2311.17983
12
citations

Instance-Aware Group Quantization for Vision Transformers

Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.

CVPR 2024arXiv:2404.00928
15
citations

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Chao Li, Anbang Yao

ICML 2024arXiv:2406.07879
9
citations

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

Jihai Zhang, Xiang Lan, Xiaoye Qu et al.

ECCV 2024arXiv:2402.11816
5
citations

LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

Saksham Suri, Matthew Walmer, Kamal Gupta et al.

ECCV 2024arXiv:2403.14625
17
citations

LION: Implicit Vision Prompt Tuning

Haixin Wang, Jianlong Chang, Yihang Zhai et al.

AAAI 2024paperarXiv:2303.09992
36
citations

LookupViT: Compressing visual information to a limited number of tokens

Rajat Koner, Gagan Jain, Sujoy Paul et al.

ECCV 2024arXiv:2407.12753
16
citations

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

Dingyuan Zhang, Dingkang Liang, Zichang Tan et al.

ECCV 2024arXiv:2409.00633
4
citations

Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers

Zhiyu Yao, Jian Wang, Haixu Wu et al.

ICML 2024

MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection

Shiyuan Meng, Wenchao Meng, Qihang Zhou et al.

ECCV 2024
14
citations

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.

CVPR 2024arXiv:2401.14405
12
citations

Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization

Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos

CVPR 2024arXiv:2312.17686
7
citations

One Meta-tuned Transformer is What You Need for Few-shot Learning

Xu Yang, Huaxiu Yao, Ying WEI

ICML 2024spotlight