Supervising Sound Localization by In-the-wild Egomotion

3citations

citations

#1765

in CVPR 2025

of 2873 papers

Top Authors

Data Points

Top Authors

Anna Min Ziyang Chen Hang Zhao Andrew Owens

Abstract

We present a method for learning binaural sound localization using egomotion as a supervisory signal. Over the course of a video, the camera’s direction to a sound source will change as the camera moves. We train an audio model to predict sound directions that are consistent with visual estimates of camera motion, which we obtain using traditional methods from multi-view geometry. This provides a weak but plentiful form of supervision that we combine with traditional binaural cues. To evaluate this method, we propose a dataset of real-world audio-visual videos with egomotion. We show that our model can successfully learn from real-world data and that it performs well on sound localization tasks.

Citation History

Jan 24, 2026

Jan 26, 2026

Jan 27, 2026

Feb 3, 2026

3+3