Poster "vision transformers" Papers

96 papers found • Page 2 of 2

ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models

Guoyizhe Wei, Rama Chellappa

ICCV 2025arXiv:2504.00037
3
citations

VSSD: Vision Mamba with Non-Causal State Space Duality

Yuheng Shi, Mingjia Li, Minjing Dong et al.

ICCV 2025arXiv:2407.18559
30
citations

Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation

Jialai Wang, Yuxiao Wu, Weiye Xu et al.

CVPR 2025
3
citations

Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control

Dongyoon Hwang, Byungkun Lee, Hojoon Lee et al.

ICML 2024arXiv:2406.06072

Agent Attention: On the Integration of Softmax and Linear Attention

Dongchen Han, Tianzhu Ye, Yizeng Han et al.

ECCV 2024arXiv:2312.08874
212
citations

AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers

Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.

ICML 2024arXiv:2402.05602
92
citations

AttnZero: Efficient Attention Discovery for Vision Transformers

Lujun Li, Zimian Wei, Peijie Dong et al.

ECCV 2024
14
citations

AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors

Kaishen Yuan, Zitong Yu, Xin Liu et al.

ECCV 2024arXiv:2403.04697
34
citations

Characterizing Model Robustness via Natural Input Gradients

Adrian Rodriguez-Munoz, Tongzhou Wang, Antonio Torralba

ECCV 2024arXiv:2409.20139
2
citations

Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

Itamar Zimerman, Moran Baruch, Nir Drucker et al.

ICML 2024arXiv:2311.08610
22
citations

Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks

Mikkel Jordahn, Pablo Olmos

ICML 2024arXiv:2405.01196
4
citations

Denoising Vision Transformers

Jiawei Yang, Katie Luo, Jiefeng Li et al.

ECCV 2024arXiv:2401.02957
31
citations

DiffiT: Diffusion Vision Transformers for Image Generation

Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.

ECCV 2024arXiv:2312.02139
122
citations

Efficient Multitask Dense Predictor via Binarization

Yuzhang Shang, Dan Xu, Gaowen Liu et al.

CVPR 2024arXiv:2405.14136
6
citations

Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches

Qing Yu, Mikihiro Tanaka, Kent Fujiwara

CVPR 2024arXiv:2405.04771
16
citations

Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention

Aaron Havens, Alexandre Araujo, Huan Zhang et al.

ICML 2024

GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features

Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.

ECCV 2024arXiv:2407.12427
28
citations

Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning

Pengyu Li, Biao Wang, Tianchu Guo et al.

ECCV 2024

Instance-Aware Group Quantization for Vision Transformers

Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.

CVPR 2024arXiv:2404.00928
15
citations

KernelWarehouse: Rethinking the Design of Dynamic Convolution

Chao Li, Anbang Yao

ICML 2024arXiv:2406.07879
9
citations

Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning

Jihai Zhang, Xiang Lan, Xiaoye Qu et al.

ECCV 2024arXiv:2402.11816
5
citations

LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

Saksham Suri, Matthew Walmer, Kamal Gupta et al.

ECCV 2024arXiv:2403.14625
17
citations

LookupViT: Compressing visual information to a limited number of tokens

Rajat Koner, Gagan Jain, Sujoy Paul et al.

ECCV 2024arXiv:2407.12753
16
citations

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

Dingyuan Zhang, Dingkang Liang, Zichang Tan et al.

ECCV 2024arXiv:2409.00633
4
citations

Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers

Zhiyu Yao, Jian Wang, Haixu Wu et al.

ICML 2024

MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection

Shiyuan Meng, Wenchao Meng, Qihang Zhou et al.

ECCV 2024
14
citations

Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.

CVPR 2024arXiv:2401.14405
12
citations

Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization

Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos

CVPR 2024arXiv:2312.17686
7
citations

PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference

Tanvir Mahmud, Burhaneddin Yaman, Chun-Hao Liu et al.

ECCV 2024arXiv:2403.16020
7
citations

PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers

Ananthu Aniraj, Cassio F. Dantas, Dino Ienco et al.

ECCV 2024arXiv:2407.04538
6
citations

Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation

Hoyong Kwon, Jaeseok Jeong, Sung-Hoon Yoon et al.

ECCV 2024

Probabilistic Conceptual Explainers: Trustworthy Conceptual Explanations for Vision Foundation Models

Hengyi Wang, Shiwei Tan, Hao Wang

ICML 2024arXiv:2406.12649
9
citations

Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining

Diwei Su, cheng fei, Jianxu Luo

ECCV 2024
2
citations

Revealing the Dark Secrets of Extremely Large Kernel ConvNets on Robustness

Honghao Chen, Zhang Yurong, xiaokun Feng et al.

ICML 2024arXiv:2407.08972
10
citations

Robustness Tokens: Towards Adversarial Robustness of Transformers

Brian Pulfer, Yury Belousov, Slava Voloshynovskiy

ECCV 2024arXiv:2503.10191

Rotation-Agnostic Image Representation Learning for Digital Pathology

Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.

CVPR 2024arXiv:2311.08359
19
citations

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

Kyunghwan Shim, Jaewoong Yun, Shinkook Choi

ECCV 2024arXiv:2404.11630
3
citations

Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications

Zixuan Hu, Yongxian Wei, Li Shen et al.

ICML 2024arXiv:2510.27186
8
citations

SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization

Xixu Hu, Runkai Zheng, Jindong Wang et al.

ECCV 2024arXiv:2402.03317
5
citations

Stitched ViTs are Flexible Vision Backbones

Zizheng Pan, Jing Liu, Haoyu He et al.

ECCV 2024arXiv:2307.00154
4
citations

Sub-token ViT Embedding via Stochastic Resonance Transformers

Dong Lao, Yangchao Wu, Tian Yu Liu et al.

ICML 2024arXiv:2310.03967
7
citations

Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning

wenlong deng, Christos Thrampoulidis, Xiaoxiao Li

CVPR 2024arXiv:2310.18285
21
citations

VideoMAC: Video Masked Autoencoders Meet ConvNets

Gensheng Pei, Tao Chen, Xiruo Jiang et al.

CVPR 2024arXiv:2402.19082
21
citations

Vision Transformers as Probabilistic Expansion from Learngene

Qiufeng Wang, Xu Yang, Haokun Chen et al.

ICML 2024

xT: Nested Tokenization for Larger Context in Large Images

Ritwik Gupta, Shufan Li, Tyler Zhu et al.

ICML 2024arXiv:2403.01915
8
citations

Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers

Hongjie Wang, Bhishma Dedhia, Niraj Jha

CVPR 2024arXiv:2305.17328
61
citations