3
citations
#1765
in CVPR 2025
of 2873 papers
4
Top Authors
5
Data Points
Top Authors
Abstract
We present a method for learning binaural sound localization using egomotion as a supervisory signal. Over the course of a video, the camera’s direction to a sound source will change as the camera moves. We train an audio model to predict sound directions that are consistent with visual estimates of camera motion, which we obtain using traditional methods from multi-view geometry. This provides a weak but plentiful form of supervision that we combine with traditional binaural cues. To evaluate this method, we propose a dataset of real-world audio-visual videos with egomotion. We show that our model can successfully learn from real-world data and that it performs well on sound localization tasks.
Citation History
Jan 24, 2026
0
Jan 26, 2026
0
Jan 26, 2026
0
Jan 27, 2026
0
Feb 3, 2026
3+3