Counterfactual Debiasing for Physical Audiovisual Commonsense Reasoning

0citations
PDFProject
0
citations
#2074
in AAAI 2025
of 3028 papers
5
Top Authors
2
Data Points

Abstract

Physical commonsense is an essential aspect of human cognition, involving an intuitive understanding of the physical properties and interactions of everyday objects and materials. Though physical commonsense reasoning should inherently be a multisensory task, integrating both video and audio signals, existing physical audiovisual commonsense reasoning (PACR) models predominantly rely on visual information. This reliance leads to spurious correlations and undermines the models’ reasoning and generalization abilities. To counteract this, we introduce a model-agnostic Counterfactual Physical Audiovisual Commonsense Reasoning (CF-PACR) framework aimed at mitigating visual bias-induced spurious effects. Specifically, we construct a traditional PACR model using both audio and visual information as the factual reasoning model. Subsequently, in the counterfactual reasoning model, we isolate visual information to estimate direct effects. Finally, we subtract the direct effects from the total effects across modalities to derive indirect effects, thereby mitigating visual biases. Extensive experiments validate the effectiveness and generalizability of CF-PACR in alleviating the spurious correlations between visual modality and model predictions.

Citation History

Jan 27, 2026
0
Feb 4, 2026
0