LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos

12citations
arXiv:2405.13722
12
citations
#472
in ICML 2025
of 3340 papers
5
Top Authors
4
Data Points

Abstract

Accuracy and speed are critical in image editing tasks. Pan et al. introduced a drag-based framework using Generative Adversarial Networks, and subsequent studies have leveraged large-scale diffusion models. However, these methods often require over a minute per edit and exhibit low success rates. We present LightningDrag, which achieves high-quality drag-based editing in about one second on general images. By redefining drag-based editing as a conditional generation task, we eliminate the need for time-consuming latent optimization or gradient-based guidance. Our model is trained on large-scale paired video frames, capturing diverse motion (object translations, pose shifts, zooming, etc.) to significantly improve accuracy and consistency. Despite being trained only on videos, our model generalizes to local deformations beyond the training data (e.g., lengthening hair, twisting rainbows). Extensive evaluations confirm the superiority of our approach, and we will release both code and model.

Citation History

Jan 28, 2026
0
Feb 13, 2026
12+12
Feb 13, 2026
12
Feb 13, 2026
12