"vision transformer" Papers

52 papers found • Page 1 of 2

A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention

Qiyu Xu, Zhanxuan Hu, Yu Duan et al.

ICCV 2025arXiv:2507.14315
3
citations

Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT

Guy Bar-Shalom, Fabrizio Frasca, Yaniv Galron et al.

NEURIPS 2025arXiv:2510.00296
1
citations

Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method

Han Wang, Shengyang Li, Jian Yang et al.

ICCV 2025arXiv:2506.22027
6
citations

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.

ICLR 2025arXiv:2410.02073
316
citations

Efficient Concertormer for Image Deblurring and Beyond

Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.

ICCV 2025arXiv:2404.06135

EMPLACE: Self-Supervised Urban Scene Change Detection

Tim Alpherts, Sennay Ghebreab, Nanne van Noord

AAAI 2025paperarXiv:2503.17716
5
citations

Enhancing Vision-Language Model with Unmasked Token Alignment

Hongsheng Li, Jihao Liu, Boxiao Liu et al.

ICLR 2025arXiv:2405.19009

EOV-Seg: Efficient Open-Vocabulary Panoptic Segmentation

Hongwei Niu, Jie Hu, Jianghang Lin et al.

AAAI 2025paperarXiv:2412.08628
6
citations

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang, Feng Lin, Tong Wu et al.

CVPR 2025arXiv:2412.12032
11
citations

GSPN-2: Efficient Parallel Sequence Modeling

Hongjun Wang, yitong jiang, Collin McCarthy et al.

NEURIPS 2025arXiv:2512.07884

Hybrid Spiking Vision Transformer for Object Detection with Event Cameras

Qi Xu, Jie Deng, Jiangrong Shen et al.

ICML 2025oralarXiv:2505.07715
3
citations

On the Role of Hidden States of Modern Hopfield Network in Transformer

NEURIPS 2025arXiv:2511.20698

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

ICCV 2025arXiv:2405.13337
3
citations

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Haoyi Zhu, Honghui Yang, Yating Wang et al.

ICLR 2025arXiv:2410.08208
24
citations

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Hyesu Lim, Jinho Choi, Jaegul Choo et al.

ICLR 2025arXiv:2412.05276
31
citations

TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning

Seungmin Baek, Soyul Lee, Hayeon Jo et al.

CVPR 2025arXiv:2501.04293
1
citations

VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution

Limeng Qiao, Yiyang Gan, Bairui Wang et al.

NEURIPS 2025oral
3
citations

Agglomerative Token Clustering

Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor et al.

ECCV 2024arXiv:2409.11923
7
citations

Attention Guided CAM: Visual Explanations of Vision Transformer Guided by Self-Attention

Saebom Leem, Hyunseok Seo

AAAI 2024paperarXiv:2402.04563
32
citations

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024arXiv:2307.12732
86
citations

ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy

Kirill Vishniakov, Zhiqiang Shen, Zhuang Liu

ICML 2024arXiv:2311.09215
26
citations

Data-free Neural Representation Compression with Riemannian Neural Dynamics

Zhengqi Pei, Anran Zhang, Shuhui Wang et al.

ICML 2024

Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

Yixing Xu, Chao Li, Dong Li et al.

ICML 2024

FairViT: Fair Vision Transformer via Adaptive Masking

Bowei Tian, Ruijie Du, Yanning Shen

ECCV 2024arXiv:2407.14799
2
citations

FiT: Flexible Vision Transformer for Diffusion Model

Zeyu Lu, ZiDong Wang, Di Huang et al.

ICML 2024spotlightarXiv:2402.12376
77
citations

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

Ding Jia, Jianyuan Guo, Kai Han et al.

ICML 2024arXiv:2406.01210
51
citations

HEAL-SWIN: A Vision Transformer On The Sphere

Oscar Carlsson, Jan E. Gerken, Hampus Linander et al.

CVPR 2024arXiv:2307.07313
14
citations

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Shaohong Wang, Lu Bin, Xinyu Xiao et al.

ECCV 2024arXiv:2407.09857
8
citations

Information Flow in Self-Supervised Learning

Zhiquan Tan, Jingqin Yang, Weiran Huang et al.

ICML 2024arXiv:2309.17281
17
citations

Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

Yongxin Li, Mengyuan Liu, You Wu et al.

ICML 2024

Learning with Unmasked Tokens Drives Stronger Vision Learners

Taekyung Kim, Sanghyuk Chun, Byeongho Heo et al.

ECCV 2024arXiv:2310.13593
3
citations

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

CVPR 2024arXiv:2403.10254
49
citations

MedSegDiff-V2: Diffusion-based Medical Image Segmentation with Transformer

Junde Wu, Wei Ji, Huazhu Fu et al.

AAAI 2024paperarXiv:2301.11798
274
citations

Memory Consolidation Enables Long-Context Video Understanding

Ivana Balazevic, Yuge Shi, Pinelopi Papalampidi et al.

ICML 2024oral

MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks

Sanbao Su, Xin Li, Thang Doan et al.

ECCV 2024
2
citations

One-stage Prompt-based Continual Learning

Youngeun Kim, YUHANG LI, Priyadarshini Panda

ECCV 2024arXiv:2402.16189
17
citations

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICML 2024

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

Shiming Chen, Wenjin Hou, Salman Khan et al.

CVPR 2024arXiv:2404.07713
36
citations

Question Aware Vision Transformer for Multimodal Reasoning

Roy Ganz, Yair Kittenplon, Aviad Aberdam et al.

CVPR 2024highlightarXiv:2402.05472
37
citations

Rejuvenating image-GPT as Strong Visual Representation Learners

Sucheng Ren, Zeyu Wang, Hongru Zhu et al.

ICML 2024arXiv:2312.02147
14
citations

S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention

Chiyu Zhang, Xiaogang Xu, Lei Wang et al.

AAAI 2024paperarXiv:2210.12381
52
citations

Semantic-Aware Autoregressive Image Modeling for Visual Representation Learning

Kaiyou Song, Shan Zhang, Tong Wang

AAAI 2024paperarXiv:2312.10457
2
citations

SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

Seokju Yun, Youngmin Ro

CVPR 2024arXiv:2401.16456
102
citations

SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning

Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara

ICML 2024arXiv:2406.15025
1
citations

Statistical Test for Attention Maps in Vision Transformers

Tomohiro Shiraishi, Daiki Miwa, Teruyuki Katsuoka et al.

ICML 2024

Stochastic positional embeddings improve masked image modeling

Amir Bar, Florian Bordes, Assaf Shocher et al.

ICML 2024arXiv:2308.00566
6
citations

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.

CVPR 2024arXiv:2401.06312
34
citations

ViP: A Differentially Private Foundation Model for Computer Vision

Yaodong Yu, Maziar Sanjabi, Yi Ma et al.

ICML 2024arXiv:2306.08842
18
citations

Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting

Zhicheng Wang, Liwen Xiao, Zhiguo Cao et al.

AAAI 2024paperarXiv:2305.04440
29
citations

ViT-Calibrator: Decision Stream Calibration for Vision Transformer

Lin Chen, Zhijie Jia, Lechao Cheng et al.

AAAI 2024paperarXiv:2304.04354
3
citations
PreviousNext