"vision transformers" Papers
122 papers found • Page 1 of 3
Conference
A Circular Argument: Does RoPE need to be Equivariant for Vision?
Chase van de Geijn, Timo Lüddecke, Polina Turishcheva et al.
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
Qiyuan Dai, Hanzhuo Huang, Yu Wu et al.
Alias-Free ViT: Fractional Shift Invariance via Linear Attention
Hagay Michaeli, Daniel Soudry
A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning
Minyoung Kim, Timothy Hospedales
A Theoretical Analysis of Self-Supervised Learning for Vision Transformers
Yu Huang, Zixin Wen, Yuejie Chi et al.
BHViT: Binarized Hybrid Vision Transformer
Tian Gao, Yu Zhang, Zhiyuan Zhang et al.
BiggerGait: Unlocking Gait Recognition with Layer-wise Representations from Large Vision Models
Dingqiang Ye, Chao Fan, Zhanbo Huang et al.
Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification
Yucong Meng, Zhiwei Yang, Yonghong Shi et al.
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
Andrew Luo, Jacob Yeung, Rushikesh Zawar et al.
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham, Juan C. Caicedo, Bryan Plummer
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment
Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans
Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers
Bum Jun Kim, Sang Woo Kim
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers
Li Ren, Chen Chen, Liqiang Wang et al.
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi et al.
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.
Elastic ViTs from Pretrained Models without Retraining
Walter Simoncini, Michael Dorkenwald, Tijmen Blankevoort et al.
Energy Landscape-Aware Vision Transformers: Layerwise Dynamics and Adaptive Task-Specific Training via Hopfield States
Runze Xia, Richard Jiang
FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.
FLOPS: Forward Learning with OPtimal Sampling
Tao Ren, Zishi Zhang, Jinyang Jiang et al.
Generative Medical Segmentation
Jiayu Huo, Xi Ouyang, Sébastien Ourselin et al.
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Guang Liang, Xinyao Liu, Jianxin Wu
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu, Xucheng Wang, Xiangyang Yang et al.
LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity
Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.
LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions
Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.
Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials
Yifan Pu, Jixuan Ying, Qixiu Li et al.
Locality Alignment Improves Vision-Language Models
Ian Covert, Tony Sun, James Y Zou et al.
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
Anthony Fuller, Yousef Yassin, Junfeng Wen et al.
L-SWAG: Layer-Sample Wise Activation with Gradients Information for Zero-Shot NAS on Vision Transformers
Sofia Casarin, Sergio Escalera, Oswald Lanz
LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
Jinghan You, Shanglin Li, Yuanrui Sun et al.
MambaIRv2: Attentive State Space Restoration
Hang Guo, Yong Guo, Yaohua Zha et al.
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu, Xinchao Wang
Metric-Driven Attributions for Vision Transformers
Chase Walker, Sumit Jha, Rickard Ewetz
MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity
Kanghyun Choi, Hyeyoon Lee, Dain Kwon et al.
Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction
Giuseppe Cartella, Vittorio Cuculo, Alessandro D'Amelio et al.
Morphing Tokens Draw Strong Masked Image Models
Taekyung Kim, Byeongho Heo, Dongyoon Han
Multi-Kernel Correlation-Attention Vision Transformer for Enhanced Contextual Understanding and Multi-Scale Integration
Hongkang Zhang, Shao-Lun Huang, Ercan KURUOGLU et al.
Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning
Sheng Li, Qitao Tan, Yue Dai et al.
Normalize Filters! Classical Wisdom for Deep Vision
Gustavo Perez, Stella X. Yu
PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies
Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki et al.
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng, Yadan Luo, Xin Li et al.
Polyline Path Masked Attention for Vision Transformer
Zhongchen Zhao, Chaodong Xiao, Hui LIN et al.
Prior-guided Hierarchical Harmonization Network for Efficient Image Dehazing
Xiongfei Su, Siyuan Li, Yuning Cui et al.
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis
Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.
Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2
Joel Valdivia Ortega, Lorenz Lamm, Franziska Eckardt et al.
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Alexander Lappe, Martin Giese
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Giyeong Oh, Woohyun Cho, Siyeol Kim et al.