Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis

6citations

arXiv:2501.09333

citations

#1233

in CVPR 2025

of 2873 papers

Top Authors

Data Points

Top Authors

Arpita Chowdhury Dipanjyoti Paul Zheda Mai Jianyang Gu Ziheng Zhang Kazi Sajeed Mehrab Elizabeth Campolongo Daniel Rubenstein Charles Stewart Anuj Karpatne Tanya Berger-Wolf Yu Su Wei-Lun Chao

Topics

vision transformers interpretable ai fine-grained classification attention maps visual prompt tuning saliency maps class-specific prompts trait localization

Abstract

We present a simple approach to make pre-trained Vision Transformers (ViTs) interpretable for fine-grained analysis, aiming to identify and localize the traits that distinguish visually similar categories, such as bird species. Pre-trained ViTs, such as DINO, have demonstrated remarkable capabilities in extracting localized, discriminative features. However, saliency maps like Grad-CAM often fail to identify these traits, producing blurred, coarse heatmaps that highlight entire objects instead. We propose a novel approach, Prompt Class Attention Map (Prompt-CAM), to address this limitation. Prompt-CAM learns class-specific prompts for a pre-trained ViT and uses the corresponding outputs for classification. To correctly classify an image, the true-class prompt must attend to unique image patches not present in other classes' images (i.e., traits). As a result, the true class's multi-head attention maps reveal traits and their locations. Implementation-wise, Prompt-CAM is almost a ``free lunch,'' requiring only a modification to the prediction head of Visual Prompt Tuning (VPT). This makes Prompt-CAM easy to train and apply, in stark contrast to other interpretable methods that require designing specific models and training processes. Extensive empirical studies on a dozen datasets from various domains (e.g., birds, fishes, insects, fungi, flowers, food, and cars) validate the superior interpretation capability of Prompt-CAM. The source code and demo are available at https://github.com/Imageomics/Prompt_CAM.

Citation History

Jan 24, 2026

Jan 26, 2026

Jan 27, 2026

6+6

Feb 3, 2026

Feb 13, 2026