Fair-VPT: Fair Visual Prompt Tuning for Image Classification

19citations
19
citations
#1312
in CVPR 2024
of 2716 papers
2
Top Authors
2
Data Points

Abstract

Despite the remarkable success of Vision Transformers (ViT) across diverse fields in computer vision, they have a clear drawback of expensive adaption cost for downstream tasks due to the increased scale. To address this, Visual Prompt Tuning (VPT) incorporates learnable parameters in the input space of ViT. While freezing the ViT backbone and tuning only the prompts, it exhibits superior performances to full fine-tuning. However, despite the outstanding advantage, we point out that VPT may lead to serious unfairness in downstream classification. Initially, we investigate the causes of unfairness in VPT, identifying the biasedly pre-trained ViT as a principal factor. Motivated by this observation, we propose a Fair Visual Prompt Tuning (Fair-VPT) which removes biased information in the pre-trained ViT while adapting it to downstream classification tasks. To this end, we categorize prompts into “cleaner prompts” and “target prompts”. Based on this, we encode the class token in two different ways by either masking or not masking the target prompts in the self-attention process. These encoded tokens are trained with distinct objective functions, resulting in the inclusion of different information in the target and cleaner prompts. Moreover, we introduce a disentanglement loss based on contrastive learning to further decorrelate them. In experiments across diverse benchmarks, the proposed method demonstrates the most superior performance in terms of balanced classification accuracy and fairness.

Citation History

Jan 27, 2026
18
Feb 13, 2026
19+1