Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

1citations

arXiv:2412.06458

citations

#1381

in ICCV 2025

of 2701 papers

Top Authors

Data Points

Top Authors

Wei Suo Ji Ma Mengyang Sun Lin Wu PENG WANG Yanning Zhang

Topics

inference efficiency vision language models parameter pruning token pruning meta-router self-supervised learning computational cost reduction

Abstract

Although Large Vision-Language Models (LVLMs) have achieved impressive results, their high computational costs pose a significant barrier to wide application. To enhance inference efficiency, most existing approaches can be categorized as parameter-dependent or token-dependent strategies to reduce computational demands. However, parameter-dependent methods require retraining LVLMs to recover performance while token-dependent strategies struggle to consistently select the most relevant tokens. In this paper, we systematically analyze the above challenges and provide a series of valuable insights for inference acceleration. Based on these findings, we propose a novel framework, the Pruning All-Rounder (PAR). Different from previous works, PAR develops a meta-router to adaptively organize pruning flows across both tokens and layers. With a self-supervised learning manner, our method achieves a superior balance between performance and efficiency. Notably, PAR is highly flexible, offering multiple pruning versions to address a range of acceleration scenarios. The code for this work is publicly available at https://github.com/ASGO-MM/Pruning-All-Rounder.

Citation History

Jan 26, 2026

Feb 2, 2026

Feb 7, 2026

Feb 13, 2026