ArtSpeak: An Interactive AR Application for Lifelike Speaking with Art Portraits
Top Authors
Abstract
Museum visits often lack personalized and interactive experiences, limiting visitor engagement with art and historical artifacts. To address this, we present ArtSpeak, a standalone augmented reality (AR) application that transforms traditional art viewing into an interactive storytelling experience. When users point their mobile cameras at an artwork, the system responds to their questions with lifelike, talking-head video narratives generated from historical portraits. However, generating such talking-head videos at runtime is computationally expensive, often requiring over a minute per response. To address this challenge, ArtSpeak introduces two major contributions. First, it employs a collection of frequently asked questions (FAQ) to generate a set of lifelike video responses for various art portraits. Second, it introduces a novel retrieval-based approach that uses GPT-based embeddings and cosine similarity to select the most relevant response. As a result, the system dynamically presents the video reply that best aligns with the user's inquiry, reducing computational overhead and ensuring a real-time, low-latency experience. More precisely, ArtSpeak achieves over 30 x lower latency and reduces energy consumption by approximately 81 % compared to the real-time video generation method. User studies further validate the system's effectiveness, with 85 % of participants rating the retrieved responses as relevant to their queries and 90 % reporting smooth video playback. These results highlight the efficiency and user satisfaction enabled by our retrieval-based approach.