Counterfactual Evolution of Multimodal Datasets via Visual Programming

0citations

citations

#3347

in NEURIPS 2025

of 5858 papers

Top Authors

Data Points

Top Authors

Minghe Gao Zhongqi Yue Wenjie Yan Yihao Hu Wei Ji Siliang Tang Jun Xiao Tat-Seng Chua Yueting Zhuang Juncheng Li

Abstract

The rapid development of Multimodal Large Language Models (MLLMs) poses increasing demands on the diversity and complexity of multimodal datasets. Yet manual annotation pipelines can no longer keep pace. Existing augmentation methods often follow fixed rules and lack verifiable control over sample diversity and reasoning complexity. To address this, we introduce Scalable COunterfactual Program Evolution (SCOPE), a framework that uses symbolic Visual Programming to guide program evolution via counterfactual reasoning. SCOPE performs the three steps of counterfactual inference: (1) Abduction, by generating verifiable programs to model reasoning associations; (2) Action, by intervening on program structure along three axes—reasoning path, visual context, and cross-instance composition; and (3) Prediction, by categorizing evolved instances by difficulty, structure, and input multiplicity. Based on this process, we build SCOPE-Train and SCOPE-Test, evolving benchmarks with expert validation. To support training, we propose MAP, a curriculum learning strategy that aligns model capacity with sample difficulty. Experiments show that SCOPE improves reasoning performance, exposes model blind spots, and enhances visual dialog capabilities.

Citation History

Jan 25, 2026

Jan 26, 2026

Jan 28, 2026