"vision transformers" Papers
122 papers found • Page 2 of 3
Conference
Scalable Neural Network Geometric Robustness Validation via Hölder Optimisation
Yanghao Zhang, Panagiotis Kouvaros, Alessio Lomuscio
Semantic Alignment and Reinforcement for Data-Free Quantization of Vision Transformers
Yunshan Zhong, Yuyao Zhou, Yuxin Zhang et al.
Sinusoidal Initialization, Time for a New Start
Alberto Fernandez-Hernandez, Jose Mestre, Manuel F. Dolz et al.
SonoGym: High Performance Simulation for Challenging Surgical Tasks with Robotic Ultrasound
Yunke Ao, Masoud Moghani, Mayank Mittal et al.
Spectral State Space Model for Rotation-Invariant Visual Representation Learning
Sahar Dastani, Ali Bahri, Moslem Yazdanpanah et al.
Spiking Vision Transformer with Saccadic Attention
Shuai Wang, Malu Zhang, Dehao Zhang et al.
Split Adaptation for Pre-trained Vision Transformers
Lixu Wang, Bingqi Shang, Yi Li et al.
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo, Ehsan Abbasnejad, Damien Teney et al.
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection
Qiang Qi, Xiao Wang
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai, Andrea Vedaldi
TRUST: Test-Time Refinement using Uncertainty-Guided SSM Traverses
Sahar Dastani, Ali Bahri, Gustavo Vargas Hakim et al.
Variance-Based Pruning for Accelerating and Compressing Trained Networks
Uranik Berisha, Jens Mehnert, Alexandru Condurache
Vision Transformers Don't Need Trained Registers
Nicholas Jiang, Amil Dravid, Alexei Efros et al.
Vision Transformers with Self-Distilled Registers
Zipeng Yan, Yinjie Chen, Chong Zhou et al.
ViT-EnsembleAttack: Augmenting Ensemble Models for Stronger Adversarial Transferability in Vision Transformers
Hanwen Cao, Haobo Lu, Xiaosen Wang et al.
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei, Rama Chellappa
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi, Mingjia Li, Minjing Dong et al.
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
Jialai Wang, Yuxiao Wu, Weiye Xu et al.
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies, Niccolò Cavagnero, Alexander Hermans et al.
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Dongyoon Hwang, Byungkun Lee, Hojoon Lee et al.
Agent Attention: On the Integration of Softmax and Linear Attention
Dongchen Han, Tianzhu Ye, Yizeng Han et al.
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.
AttnZero: Efficient Attention Discovery for Vision Transformers
Lujun Li, Zimian Wei, Peijie Dong et al.
AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
Kaishen Yuan, Zitong Yu, Xin Liu et al.
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis
Esteve Valls Mascaro, Hyemin Ahn, Dongheui Lee
Characterizing Model Robustness via Natural Input Gradients
Adrian Rodriguez-Munoz, Tongzhou Wang, Antonio Torralba
Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget
Johannes Lehner, Benedikt Alkin, Andreas Fürst et al.
Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption
Itamar Zimerman, Moran Baruch, Nir Drucker et al.
Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks
Mikkel Jordahn, Pablo Olmos
Denoising Vision Transformers
Jiawei Yang, Katie Luo, Jiefeng Li et al.
DiffiT: Diffusion Vision Transformers for Image Generation
Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.
Efficient Multitask Dense Predictor via Binarization
Yuzhang Shang, Dan Xu, Gaowen Liu et al.
ERQ: Error Reduction for Post-Training Quantization of Vision Transformers
Yunshan Zhong, Jiawei Hu, You Huang et al.
Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
Qing Yu, Mikihiro Tanaka, Kent Fujiwara
Fine-grained Local Sensitivity Analysis of Standard Dot-Product Self-Attention
Aaron Havens, Alexandre Araujo, Huan Zhang et al.
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.
Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
Pengyu Li, Biao Wang, Tianchu Guo et al.
Improving Interpretation Faithfulness for Vision Transformers
Lijie Hu, Yixin Liu, Ninghao Liu et al.
Instance-Aware Group Quantization for Vision Transformers
Jaehyeon Moon, Dohyung Kim, Jun Yong Cheon et al.
KernelWarehouse: Rethinking the Design of Dynamic Convolution
Chao Li, Anbang Yao
Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
Jihai Zhang, Xiang Lan, Xiaoye Qu et al.
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri, Matthew Walmer, Kamal Gupta et al.
LION: Implicit Vision Prompt Tuning
Haixin Wang, Jianlong Chang, Yihang Zhai et al.
LookupViT: Compressing visual information to a limited number of tokens
Rajat Koner, Gagan Jain, Sujoy Paul et al.
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Dingyuan Zhang, Dingkang Liang, Zichang Tan et al.
Mobile Attention: Mobile-Friendly Linear-Attention for Vision Transformers
Zhiyu Yao, Jian Wang, Haixu Wu et al.
MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection
Shiyuan Meng, Wenchao Meng, Qihang Zhou et al.
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong et al.
Multiscale Vision Transformers Meet Bipartite Matching for Efficient Single-stage Action Localization
Ioanna Ntinou, Enrique Sanchez, Georgios Tzimiropoulos
One Meta-tuned Transformer is What You Need for Few-shot Learning
Xu Yang, Huaxiu Yao, Ying WEI