Abstract
Virtual reality (VR) experiences often leverage rich and spatialized multimodal environments to increase immersion and engagement. This demands a consistent spatial perception of audiovisual stimuli, since perceived discrepancies can disrupt the sense of presence. In this work, we investigate the consequences of two types of spatial audiovisual disparities: true disparity, where there is a measurable spatial offset between auditory and visual cues, and perceptual disparity, where users report misalignment despite cues being colocated. Unlike most previous studies that employed controlled but simplified experimental setups, our research focuses on complex, realistic VR environments, allowing us to assess the actual implications for VR content design. Our experiments indicate that users are highly sensitive to true audiovisual disparities in controlled environments, detecting even minor misalignments. However, when engaged in additional tasks within realistic settings, their ability to notice such discrepancies diminishes significantly. We also observed that previously found perceptual disparities persist in complex audiovisual environments. However, we identify self-initiated head rotations as a key factor; its absence prevents the effect entirely. We hope our findings offer practical insights for designing more immersive and perceptually coherent VR experiences.