Poster "vision transformer" Papers

37 papers found

A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention

Qiyu Xu, Zhanxuan Hu, Yu Duan et al.

ICCV 2025arXiv:2507.14315
3
citations

Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT

Guy Bar-Shalom, Fabrizio Frasca, Yaniv Galron et al.

NEURIPS 2025arXiv:2510.00296
1
citations

Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method

Han Wang, Shengyang Li, Jian Yang et al.

ICCV 2025arXiv:2506.22027
6
citations

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.

ICLR 2025arXiv:2410.02073
316
citations

Efficient Concertormer for Image Deblurring and Beyond

Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.

ICCV 2025arXiv:2404.06135

Enhancing Vision-Language Model with Unmasked Token Alignment

Hongsheng Li, Jihao Liu, Boxiao Liu et al.

ICLR 2025arXiv:2405.19009

FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning

Gaojian Wang, Feng Lin, Tong Wu et al.

CVPR 2025arXiv:2412.12032
11
citations

GSPN-2: Efficient Parallel Sequence Modeling

Hongjun Wang, yitong jiang, Collin McCarthy et al.

NEURIPS 2025arXiv:2512.07884

Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens

Qihang Fan, Huaibo Huang, Mingrui Chen et al.

ICCV 2025arXiv:2405.13337
3
citations

SPA: 3D Spatial-Awareness Enables Effective Embodied Representation

Haoyi Zhu, Honghui Yang, Yating Wang et al.

ICLR 2025arXiv:2410.08208
24
citations

Sparse autoencoders reveal selective remapping of visual concepts during adaptation

Hyesu Lim, Jinho Choi, Jaegul Choo et al.

ICLR 2025arXiv:2412.05276
31
citations

TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning

Seungmin Baek, Soyul Lee, Hayeon Jo et al.

CVPR 2025arXiv:2501.04293
1
citations

Agglomerative Token Clustering

Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor et al.

ECCV 2024arXiv:2409.11923
7
citations

CLIP-KD: An Empirical Study of CLIP Model Distillation

Chuanguang Yang, Zhulin An, Libo Huang et al.

CVPR 2024arXiv:2307.12732
86
citations

ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy

Kirill Vishniakov, Zhiqiang Shen, Zhuang Liu

ICML 2024arXiv:2311.09215
26
citations

Data-free Neural Representation Compression with Riemannian Neural Dynamics

Zhengqi Pei, Anran Zhang, Shuhui Wang et al.

ICML 2024

Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module

Yixing Xu, Chao Li, Dong Li et al.

ICML 2024

FairViT: Fair Vision Transformer via Adaptive Masking

Bowei Tian, Ruijie Du, Yanning Shen

ECCV 2024arXiv:2407.14799
2
citations

GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer

Ding Jia, Jianyuan Guo, Kai Han et al.

ICML 2024arXiv:2406.01210
51
citations

HEAL-SWIN: A Vision Transformer On The Sphere

Oscar Carlsson, Jan E. Gerken, Hampus Linander et al.

CVPR 2024arXiv:2307.07313
14
citations

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Shaohong Wang, Lu Bin, Xinyu Xiao et al.

ECCV 2024arXiv:2407.09857
8
citations

Information Flow in Self-Supervised Learning

Zhiquan Tan, Jingqin Yang, Weiran Huang et al.

ICML 2024arXiv:2309.17281
17
citations

Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking

Yongxin Li, Mengyuan Liu, You Wu et al.

ICML 2024

Learning with Unmasked Tokens Drives Stronger Vision Learners

Taekyung Kim, Sanghyuk Chun, Byeongho Heo et al.

ECCV 2024arXiv:2310.13593
3
citations

Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification

Pingping Zhang, Yuhao Wang, Yang Liu et al.

CVPR 2024arXiv:2403.10254
49
citations

MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks

Sanbao Su, Xin Li, Thang Doan et al.

ECCV 2024
2
citations

One-stage Prompt-based Continual Learning

Youngeun Kim, YUHANG LI, Priyadarshini Panda

ECCV 2024arXiv:2402.16189
17
citations

Outlier-aware Slicing for Post-Training Quantization in Vision Transformer

Yuexiao Ma, Huixia Li, Xiawu Zheng et al.

ICML 2024

Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning

Shiming Chen, Wenjin Hou, Salman Khan et al.

CVPR 2024arXiv:2404.07713
36
citations

Rejuvenating image-GPT as Strong Visual Representation Learners

Sucheng Ren, Zeyu Wang, Hongru Zhu et al.

ICML 2024arXiv:2312.02147
14
citations

SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design

Seokju Yun, Youngmin Ro

CVPR 2024arXiv:2401.16456
102
citations

SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning

Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara

ICML 2024arXiv:2406.15025
1
citations

Statistical Test for Attention Maps in Vision Transformers

Tomohiro Shiraishi, Daiki Miwa, Teruyuki Katsuoka et al.

ICML 2024

Stochastic positional embeddings improve masked image modeling

Amir Bar, Florian Bordes, Assaf Shocher et al.

ICML 2024arXiv:2308.00566
6
citations

Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention

Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.

CVPR 2024arXiv:2401.06312
34
citations

ViP: A Differentially Private Foundation Model for Computer Vision

Yaodong Yu, Maziar Sanjabi, Yi Ma et al.

ICML 2024arXiv:2306.08842
18
citations

When Will Gradient Regularization Be Harmful?

Yang Zhao, Hao Zhang, Xiuyuan Hu

ICML 2024arXiv:2406.09723
2
citations