Reinforcement Learning Enhanced Full-Duplex Spoken Dialogue Language Models for Conversational Interactions
Top Authors
Abstract
Mainstream spoken dialogue language models (SDLMs) primarily handle turn-based interactions by alternating between processing user speech and generating responses. Recently emerging full-duplex SDLMs have showcased more natural and engaging conversational performance by simultaneously listening and speaking. However, the complex dynamics of human conversation introduce unique challenges to full-duplex SDLMs: Beyond generating reasonable responses, these models must exhibit diverse and prompt conversational behaviors in real-time interactions with the user. In this work, we present an efficient full-duplex SDLM optimized by Online Reinforcement with Interactive Speech Evaluation (ORISE). In ORISE, we design a customized reward function derived from automated annotations of online generated speech to guide the model toward well-formed and speech-text aligned responses. Experimental results show that ORISE effectively improves robustness and accuracy in handling conversational dynamics, including turn-taking, user barge-in, and backchanneling. Furthermore, ORISE enables the model to adapt to unseen noise conditions without relying on any labeled data, demonstrating the generalization of ORISE in real-world scenarios.