"vision transformers" Papers

122 papers found • Page 1 of 3

A Circular Argument: Does RoPE need to be Equivariant for Vision?

Chase van de Geijn, Timo Lüddecke, Polina Turishcheva et al.

NEURIPS 2025arXiv:2511.08368
2
citations

Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement

Qiyuan Dai, Hanzhuo Huang, Yu Wu et al.

CVPR 2025arXiv:2507.06928
7
citations

Alias-Free ViT: Fractional Shift Invariance via Linear Attention

Hagay Michaeli, Daniel Soudry

NEURIPS 2025arXiv:2510.22673

A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning

Minyoung Kim, Timothy Hospedales

AAAI 2025paperarXiv:2410.10417
2
citations

A Theoretical Analysis of Self-Supervised Learning for Vision Transformers

Yu Huang, Zixin Wen, Yuejie Chi et al.

ICLR 2025arXiv:2403.02233
3
citations

BHViT: Binarized Hybrid Vision Transformer

Tian Gao, Yu Zhang, Zhiyuan Zhang et al.

CVPR 2025arXiv:2503.02394
8
citations

BiggerGait: Unlocking Gait Recognition with Layer-wise Representations from Large Vision Models

Dingqiang Ye, Chao Fan, Zhanbo Huang et al.

NEURIPS 2025arXiv:2505.18132
6
citations

Boosting ViT-based MRI Reconstruction from the Perspectives of Frequency Modulation, Spatial Purification, and Scale Diversification

Yucong Meng, Zhiwei Yang, Yonghong Shi et al.

AAAI 2025paperarXiv:2412.10776
6
citations

Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers

Andrew Luo, Jacob Yeung, Rushikesh Zawar et al.

ICLR 2025arXiv:2410.05266
12
citations

ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning

Chau Pham, Juan C. Caicedo, Bryan Plummer

NEURIPS 2025arXiv:2503.19331
5
citations

Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment

Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans

CVPR 2025arXiv:2504.02522
3
citations

Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers

Bum Jun Kim, Sang Woo Kim

AAAI 2025paperarXiv:2405.14115
2
citations

DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers

Li Ren, Chen Chen, Liqiang Wang et al.

CVPR 2025arXiv:2505.23694
8
citations

Discovering Influential Neuron Path in Vision Transformers

Yifan Wang, Yifei Liu, Yingdong Shi et al.

ICLR 2025arXiv:2503.09046
4
citations

DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations

Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.

CVPR 2025arXiv:2502.06029
5
citations

Effective Interplay between Sparsity and Quantization: From Theory to Practice

Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.

ICLR 2025arXiv:2405.20935
19
citations

Elastic ViTs from Pretrained Models without Retraining

Walter Simoncini, Michael Dorkenwald, Tijmen Blankevoort et al.

NEURIPS 2025arXiv:2510.17700

Energy Landscape-Aware Vision Transformers: Layerwise Dynamics and Adaptive Task-Specific Training via Hopfield States

Runze Xia, Richard Jiang

NEURIPS 2025

FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation

Zhuguanyu Wu, Shihe Wang, Jiayi Zhang et al.

CVPR 2025highlightarXiv:2506.11543
6
citations

FLOPS: Forward Learning with OPtimal Sampling

Tao Ren, Zishi Zhang, Jinyang Jiang et al.

ICLR 2025arXiv:2410.05966
2
citations

Generative Medical Segmentation

Jiayu Huo, Xi Ouyang, Sébastien Ourselin et al.

AAAI 2025paperarXiv:2403.18198
5
citations

GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers

Guang Liang, Xinyao Liu, Jianxin Wu

NEURIPS 2025arXiv:2506.11784
4
citations

Hypergraph Vision Transformers: Images are More than Nodes, More than Edges

Joshua Fixelle

CVPR 2025arXiv:2504.08710
9
citations

Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement

Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.

CVPR 2025arXiv:2503.15404
5
citations

Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking

You Wu, Xucheng Wang, Xiangyang Yang et al.

CVPR 2025arXiv:2504.09228
20
citations

LeGrad: An Explainability Method for Vision Transformers via Feature Formation Sensitivity

Walid Bousselham, Angie Boggust, Sofian Chaybouti et al.

ICCV 2025arXiv:2404.03214
25
citations

Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition

Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.

CVPR 2025highlightarXiv:2409.16434
10
citations

LevAttention: Time, Space and Streaming Efficient Algorithm for Heavy Attentions

Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth Kacham et al.

ICLR 2025arXiv:2410.05462
2
citations

Linear Differential Vision Transformer: Learning Visual Contrasts via Pairwise Differentials

Yifan Pu, Jixuan Ying, Qixiu Li et al.

NEURIPS 2025arXiv:2511.00833
1
citations

Locality Alignment Improves Vision-Language Models

Ian Covert, Tony Sun, James Y Zou et al.

ICLR 2025arXiv:2410.11087
11
citations

LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

Anthony Fuller, Yousef Yassin, Junfeng Wen et al.

NEURIPS 2025arXiv:2505.18051
2
citations

L-SWAG: Layer-Sample Wise Activation with Gradients Information for Zero-Shot NAS on Vision Transformers

Sofia Casarin, Sergio Escalera, Oswald Lanz

CVPR 2025arXiv:2505.07300
2
citations

LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition

Jinghan You, Shanglin Li, Yuanrui Sun et al.

ICCV 2025highlightarXiv:2501.13420
6
citations

MambaIRv2: Attentive State Space Restoration

Hang Guo, Yong Guo, Yaohua Zha et al.

CVPR 2025arXiv:2411.15269
84
citations

MambaOut: Do We Really Need Mamba for Vision?

Weihao Yu, Xinchao Wang

CVPR 2025arXiv:2405.07992
193
citations

Metric-Driven Attributions for Vision Transformers

Chase Walker, Sumit Jha, Rickard Ewetz

ICLR 2025
1
citations

MimiQ: Low-Bit Data-Free Quantization of Vision Transformers with Encouraging Inter-Head Attention Similarity

Kanghyun Choi, Hyeyoon Lee, Dain Kwon et al.

AAAI 2025paperarXiv:2407.20021
7
citations

Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction

Giuseppe Cartella, Vittorio Cuculo, Alessandro D'Amelio et al.

ICCV 2025arXiv:2507.23021
1
citations

Morphing Tokens Draw Strong Masked Image Models

Taekyung Kim, Byeongho Heo, Dongyoon Han

ICLR 2025arXiv:2401.00254
3
citations

Multi-Kernel Correlation-Attention Vision Transformer for Enhanced Contextual Understanding and Multi-Scale Integration

Hongkang Zhang, Shao-Lun Huang, Ercan KURUOGLU et al.

NEURIPS 2025

Mutual Effort for Efficiency: A Similarity-based Token Pruning for Vision Transformers in Self-Supervised Learning

Sheng Li, Qitao Tan, Yue Dai et al.

ICLR 2025

Normalize Filters! Classical Wisdom for Deep Vision

Gustavo Perez, Stella X. Yu

NEURIPS 2025arXiv:2506.04401

PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies

Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki et al.

CVPR 2025arXiv:2506.09237
2
citations

PolaFormer: Polarity-aware Linear Attention for Vision Transformers

Weikang Meng, Yadan Luo, Xin Li et al.

ICLR 2025arXiv:2501.15061
42
citations

Polyline Path Masked Attention for Vision Transformer

Zhongchen Zhao, Chaodong Xiao, Hui LIN et al.

NEURIPS 2025spotlightarXiv:2506.15940

Prior-guided Hierarchical Harmonization Network for Efficient Image Dehazing

Xiongfei Su, Siyuan Li, Yuning Cui et al.

AAAI 2025paperarXiv:2503.01136
16
citations

Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis

Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.

CVPR 2025arXiv:2501.09333
6
citations

Randomized-MLP Regularization Improves Domain Adaptation and Interpretability in DINOv2

Joel Valdivia Ortega, Lorenz Lamm, Franziska Eckardt et al.

NEURIPS 2025arXiv:2511.05509

Register and [CLS] tokens induce a decoupling of local and global features in large ViTs

Alexander Lappe, Martin Giese

NEURIPS 2025
3
citations

Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks

Giyeong Oh, Woohyun Cho, Siyeol Kim et al.

NEURIPS 2025arXiv:2505.11881
PreviousNext