Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers

21citations

arXiv:2407.08394 PDF

citations

#606

in ECCV 2024

of 2387 papers

Top Authors

Data Points

Top Authors

Zhengbo Zhang Li Xu Duo Peng Hossein Rahmani Jun Liu

Abstract

We introduce Diff-Tracker, a novel approach for the challenging unsupervised visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea is to leverage the rich knowledge encapsulated within the pre-trained diffusion model, such as the understanding of image semantics and structural information, to address unsupervised visual tracking. To this end, we design an initial prompt learner to enable the diffusion model to recognize the tracking target by learning a prompt representing the target. Furthermore, to facilitate dynamic adaptation of the prompt to the target's movements, we propose an online prompt updater. Extensive experiments on five benchmark datasets demonstrate the effectiveness of our proposed method, which also achieves state-of-the-art performance.

Citation History

Jan 25, 2026

Jan 26, 2026

Jan 28, 2026

Feb 13, 2026

20+20

Feb 13, 2026

21+1