Self-Supervised Bird’s Eye View Motion Prediction with Cross-Modality Signals

8
citations
#983
in AAAI 2024
of 2289 papers
6
Top Authors
4
Data Points

Abstract

Learning the dense bird's eye view (BEV) motion flow in a self-supervised manner is an emerging research for robotics and autonomous driving. Current self-supervised methods mainly rely on point correspondences between point clouds, which may introduce the problems of fake flow and inconsistency, hindering the model's ability to learn accurate and realistic motion. In this paper, we introduce a novel cross-modality self-supervised training framework that effectively addresses these issues by leveraging multi-modality data to obtain supervision signals. We design three innovative supervision signals to preserve the inherent properties of scene motion, including the masked Chamfer distance loss, the piecewise rigidity loss, and the temporal consistency loss. Through extensive experiments, we demonstrate that our proposed self-supervised framework outperforms all previous self-supervision methods for the motion prediction task.

Citation History

Jan 28, 2026
0
Feb 13, 2026
8+8
Feb 13, 2026
8
Feb 13, 2026
8