EEG2Gaussian: Decoding and Visualizing Visual-Evoked EEG for VR Scenes Using 3D Gaussian Splatting
Abstract
Decoding and visualizing brain activity evoked by visual stimuli is critical for both understanding neural mechanisms and advancing brain-computer interfaces (BCIs). However, non-invasive signals such as Electroencephalogram (EEG) present significant challenges due to their inherently low signal-to-noise ratios. Although recent deep learning methods have resolved this task, most approaches are confined to 2D visualizations that fail to capture the complexities of real-world 3D perception. In this research, we investigate the relationship between EEG signals and 3D visual stimuli presented in virtual reality (VR) scenes, aiming to extract taskrelevant semantics from the EEG responses elicited by these stimuli. We introduce EEG2Gaussian, a novel framework for decoding and visualizing visual-evoked EEG signals by reconstructing immersive VR scenes using 3D Gaussian Splatting. The framework consists of three stages. The preprocessing stage removes noise and artifacts from raw EEG signals to provide cleaner input for subsequent processing. In the encoding stage, we propose a Neural Temporal-Frequency Encoder (NTF-Encoder) to extract temporal and frequency features using fused channel and band attention mechanisms, and disentangles them into high-level and low-level semantic representations. In the decoding stage, a 3D EEG Decoder takes these multi-level features through separate pathways as conditional inputs to guide the reconstruction of semantically consistent VR scenes. Furthermore, we construct a VR-EEG dataset that pairs real-time EEG recordings with VR scenes, and analyze how different types of scenes affect EEG responses across frequency bands. Our experimental results show that EEG2Gaussian can reconstruct VR scenes that are semantically aligned with the visual stimuli. Ablation studies verify the effectiveness of channel and band attention in EEG feature encoding, and demonstrate that combining high-level and low-level semantic features enhances the consistency and interpretability of the reconstructed scenes.