JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems

5citations
5
citations
#1373
in CVPR 2025
of 2873 papers
13
Top Authors
5
Data Points

Abstract

Unmanned Aerial Vehicles (UAVs) are widely adopted across various fields, yet they raise significant privacy and safety concerns, demanding robust monitoring solutions. Existing anti-UAV methods primarily focus on position tracking but fail to capture UAV behavior and intent. To address this, we introduce a novel task—UAV Tracking and Intent Understanding (UTIU)—which aims to track UAVs while inferring and describing their motion states and intent for a more comprehensive monitoring approach. To tackle the task, we propose JTD-UAV, the first joint tracking, and intent description framework based on large language models. Our dual-branch architecture integrates UAV tracking with Visual Question Answering (VQA), allowing simultaneous localization and behavior description. To benchmark this task, we introduce the TDUAV dataset, the largest dataset for joint UAV tracking and intent understanding, featuring 1,328 challenging video sequences, over 163K annotated thermal frames, and 3K VQA pairs. Our benchmark demonstrates the effectiveness of JTD-UAV.

Citation History

Jan 26, 2026
0
Jan 27, 2026
0
Jan 27, 2026
0
Feb 1, 2026
0
Feb 6, 2026
5+5