Supervising Sound Localization by In-the-wild Egomotion

3citations
3
citations
#1765
in CVPR 2025
of 2873 papers
4
Top Authors
5
Data Points

Abstract

We present a method for learning binaural sound localization using egomotion as a supervisory signal. Over the course of a video, the camera’s direction to a sound source will change as the camera moves. We train an audio model to predict sound directions that are consistent with visual estimates of camera motion, which we obtain using traditional methods from multi-view geometry. This provides a weak but plentiful form of supervision that we combine with traditional binaural cues. To evaluate this method, we propose a dataset of real-world audio-visual videos with egomotion. We show that our model can successfully learn from real-world data and that it performs well on sound localization tasks.

Citation History

Jan 24, 2026
0
Jan 26, 2026
0
Jan 26, 2026
0
Jan 27, 2026
0
Feb 3, 2026
3+3