Poster "vision transformer" Papers
37 papers found
Conference
A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention
Qiyu Xu, Zhanxuan Hu, Yu Duan et al.
Beyond Token Probes: Hallucination Detection via Activation Tensors with ACT-ViT
Guy Bar-Shalom, Fabrizio Frasca, Yaniv Galron et al.
Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method
Han Wang, Shengyang Li, Jian Yang et al.
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
Alexey Bochkovskiy, Amaël Delaunoy, Hugo Germain et al.
Efficient Concertormer for Image Deblurring and Beyond
Pin-Hung Kuo, Jinshan Pan, Shao-Yi Chien et al.
Enhancing Vision-Language Model with Unmasked Token Alignment
Hongsheng Li, Jihao Liu, Boxiao Liu et al.
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Gaojian Wang, Feng Lin, Tong Wu et al.
GSPN-2: Efficient Parallel Sequence Modeling
Hongjun Wang, yitong jiang, Collin McCarthy et al.
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
Qihang Fan, Huaibo Huang, Mingrui Chen et al.
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu, Honghui Yang, Yating Wang et al.
Sparse autoencoders reveal selective remapping of visual concepts during adaptation
Hyesu Lim, Jinho Choi, Jaegul Choo et al.
TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning
Seungmin Baek, Soyul Lee, Hayeon Jo et al.
Agglomerative Token Clustering
Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor et al.
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang, Zhulin An, Libo Huang et al.
ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy
Kirill Vishniakov, Zhiqiang Shen, Zhuang Liu
Data-free Neural Representation Compression with Riemannian Neural Dynamics
Zhengqi Pei, Anran Zhang, Shuhui Wang et al.
Enhancing Vision Transformer: Amplifying Non-Linearity in Feedforward Network Module
Yixing Xu, Chao Li, Dong Li et al.
FairViT: Fair Vision Transformer via Adaptive Masking
Bowei Tian, Ruijie Du, Yanning Shen
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia, Jianyuan Guo, Kai Han et al.
HEAL-SWIN: A Vision Transformer On The Sphere
Oscar Carlsson, Jan E. Gerken, Hampus Linander et al.
IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
Shaohong Wang, Lu Bin, Xinyu Xiao et al.
Information Flow in Self-Supervised Learning
Zhiquan Tan, Jingqin Yang, Weiran Huang et al.
Learning Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking
Yongxin Li, Mengyuan Liu, You Wu et al.
Learning with Unmasked Tokens Drives Stronger Vision Learners
Taekyung Kim, Sanghyuk Chun, Byeongho Heo et al.
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang, Yuhao Wang, Yang Liu et al.
MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks
Sanbao Su, Xin Li, Thang Doan et al.
One-stage Prompt-based Continual Learning
Youngeun Kim, YUHANG LI, Priyadarshini Panda
Outlier-aware Slicing for Post-Training Quantization in Vision Transformer
Yuexiao Ma, Huixia Li, Xiawu Zheng et al.
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen, Wenjin Hou, Salman Khan et al.
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren, Zeyu Wang, Hongru Zhu et al.
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun, Youngmin Ro
SiT: Symmetry-invariant Transformers for Generalisation in Reinforcement Learning
Matthias Weissenbacher, Rishabh Agarwal, Yoshinobu Kawahara
Statistical Test for Attention Maps in Vision Transformers
Tomohiro Shiraishi, Daiki Miwa, Teruyuki Katsuoka et al.
Stochastic positional embeddings improve masked image modeling
Amir Bar, Florian Bordes, Assaf Shocher et al.
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu, Maziar Sanjabi, Yi Ma et al.
When Will Gradient Regularization Be Harmful?
Yang Zhao, Hao Zhang, Xiuyuan Hu