Adapting Text-to-Image Generation with Feature Difference Instruction for Generic Image Restoration

2citations
2
citations
#1998
in CVPR 2025
of 2873 papers
6
Top Authors
2
Data Points

Abstract

Diffusion-based Text-to-Image (T2I) models have demonstrated significant potential in image restoration. However, existing models continue to grapple with challenges such as complex training and prompt design. We introduce a new perspective for improving image restoration by injecting knowledge from pretrained vision-language models into current T2I models. We empirically show that the degradation and content representations in BLIP-2 can be linearly separated, providing promising degradation guidance for image restoration. Specifically, the Feature Difference Instruction (FDI) is first extracted by Q-Formers through a simple subtraction operation based on reference image pairs. Then, we propose a multi-scale FDI adapter to decouple the degradation style and corrupted artifacts, and inject the styleflow exclusively into specific blocks through adapter-tuning, thereby preventing noise interference and eschewing the need for cumbersome weight retraining. In this way, we can train various task-specific adapters according to different degradations, achieving rich detail enhancement in the restoration results. Furthermore, the proposed FDI adapters have attractive properties of practical value, such as composability and generalization ability for all-in-one and mixed-degradation restoration. Extensive experiments under various settings demonstrate that our method has promising repairing quality over 10 image restoration tasks and a wide range of other applications.

Citation History

Jan 26, 2026
1
Feb 1, 2026
2+1