From Notation to Gesture: Virtual Conductor Gesture Generation in VR via Structured Score Semantics

0citations

citations

#33

in ISMAR 2025

of 229 papers

Top Authors

Data Points

Top Authors

Haozhe Ma Yuxin Shen Wei Liang Yunde Jia

Abstract

Conductor avatar plays a dual role in immersive Virtual Reality (VR) interactive systems by interpreting musical scores and guiding orchestral performance. Rule-based score-driven methods ensure precise synchronization with predefined conducting templates or videos, but are constrained by pre-authored data. Audio-driven frameworks offer greater adaptability through real-time gesture generation but often fail to capture the symbolic semantics of musical scores. To overcome these limitations, we propose a novel score-driven gesture generation framework that translates symbolic musical representations into plausible conducting gestures. Our approach adopts a two-stage architecture, combining a comparative learning stage for pre-training a score encoder with a generative learning stage for gesture synthesis. The score encoder explicitly models musical features such as tempo, chord, intensity, and cycle semantics, directly informing gesture generation. To support this research, we introduce Multimodal Symphonic Conducting Dataset (MSCD), the first synchronized dataset comprising conducting gestures, performance audio, and editable symbolic scores, effectively bridging the gap between musical semantics and gesture synthesis. Qualitative and quantitative analyses are provided to demonstrate the effectiveness of our approach, while a user study is designed to identify the strengths and limitations of the current work.

Citation History

Jan 27, 2026

Feb 3, 2026