What Makes Object Referencing Clear? Multimodal Strategies for Shared Understanding in XR Collaboration

0citations
0
citations
#33
in ISMAR 2025
of 229 papers
3
Top Authors
2
Data Points

Abstract

Effective shared object referencing is essential for collaboration in multi-user Extended Reality (XR) environments. While prior work has explored gaze, pointing, and speech, most studies focused on individual selection tasks, leaving a gap in understanding how these modalities support mutual comprehension. This study investigates optimal combinations of gaze, pointing, and speech under varying distances ($1 ~\mathrm{m}, 8 ~\mathrm{m}, 15 ~\mathrm{m}$), viewpoints, and object arrangements. Results show that multimodal combinations significantly improved referencing accuracy and reduced cognitive load compared to gaze-only interaction. Pointing gestures were highly effective in resolving viewpoint discrepancies. Speech provided high referencing accuracy but increased selection time, while gaze enabled fast referencing with reduced accuracy at longer distances. The integration of all three modalities achieved the best balance of speed, accuracy, and cognitive load. These findings offer practical guidance for designing XR systems that support clear and efficient shared referencing across diverse collaborative conditions.

Citation History

Jan 27, 2026
0
Feb 3, 2026
0