HarmoniCa: Harmonizing Training and Inference for Better Feature Caching in Diffusion Transformer Acceleration

10citations

arXiv:2410.01723 Project

citations

#558

in ICML 2025

of 3340 papers

Top Authors

Data Points

Top Authors

Yushi Huang Zining Wang Ruihao Gong Jing Liu Xinjie Zhang Jinyang Guo Xianglong Liu Jun Zhang

Abstract

Diffusion Transformers (DiTs) excel in generative tasks but face practical deployment challenges due to high inference costs. Feature caching, which stores and retrieves redundant computations, offers the potential for acceleration. Existing learning-based caching, though adaptive, overlooks the impact of the prior timestep. It also suffers from misaligned objectives-*aligned predicted noise vs. high-quality images*-between training and inference. These two discrepancies compromise both performance and efficiency.To this end, we *harmonize* training and inference with a novel learning-based *caching* framework dubbed **HarmoniCa**. It first incorporates *Step-Wise Denoising Training* (SDT) to ensure the continuity of the denoising process, where prior steps can be leveraged. In addition, an *Image Error Proxy-Guided Objective* (IEPO) is applied to balance image quality against cache utilization through an efficient proxy to approximate the image error. Extensive experiments across $8$ models, $4$ samplers, and resolutions from $256\times256$ to $2K$ demonstrate superior performance and speedup of our framework. For instance, it achieves over $40\\%$ latency reduction (*i.e.*, $2.07\times$ theoretical speedup) and improved performance on PixArt-$\alpha$. Remarkably, our *image-free* approach reduces training time by $25\\%$ compared with the previous method. Our code is available at https://github.com/ModelTC/HarmoniCa.

Citation History

Jan 28, 2026

Feb 13, 2026

10+10

Feb 13, 2026