Decoupling Common and Unique Representations for Multimodal Self-supervised Learning

39citations

arXiv:2309.05300 PDF

citations

#319

in ECCV 2024

of 2387 papers

Top Authors

Data Points

Top Authors

Yi Wang Conrad M Albrecht Nassim Ait Ali Braham Chenying Liu Zhitong Xiong Xiao Xiang Zhu

Abstract

The increasing availability of multi-sensor data sparks interest in multimodal self-supervised learning. However, most existing approaches learn only common representations across modalities while ignoring intra-modal training and modality-unique representations. We propose Decoupling Common and Unique Representations (DeCUR), a simple yet effective method for multimodal self-supervised learning. By distinguishing inter- and intra-modal embeddings through multimodal redundancy reduction, DeCUR can integrate complementary information across different modalities. Meanwhile, a simple residual deformable attention is introduced to help the model focus on modality-informative features. We evaluate DeCUR in three common multimodal scenarios ( radar-optical, RGB-elevation, and RGB-depth), and demonstrate its consistent and significant improvement for both multimodal and modality-missing settings. With thorough experiments and comprehensive analysis, we hope this work can provide insights and raise more interest in researching the hidden relationships of multimodal representations.

Citation History

Jan 25, 2026

Jan 27, 2026

Jan 28, 2026

Feb 13, 2026

39+39

Feb 13, 2026