Dual Focus Multiscale Attention for Object Detection in Mixed Reality: Leveraging Customizable Synthetic Datasets
Abstract
We propose a novel object detection framework tailored for mixed reality (MR), combining a customizable synthetic dataset with a lightweight attention-enhanced detection model. Our dataset generation pipeline synthesizes planetary and telescope foregrounds with hybrid real-synthetic backgrounds, enabling robust learning across variable lighting and occlusion scenarios—challenges common in educational MR environments. At the core of our architecture is the Dual Focus Multiscale Attention (DFMA) module, which simultaneously refines spatial and channel-wise features at multiple scales. Integrated into a YOLO-based (You Only Look Once) backbone and FPN, DFMA significantly improves feature discrimination while preserving real-time efficiency. On MS COCO our model improves mean Average Precision (mAP) across Intersection over Union (IoU) thresholds from 0.5 to 0.95 (mAP@0.5:0.95) over state-of-the-art nano detectors from 39.3% to 41.3% (± 2%) at only +6% params and +3% GFLOPs, with a notable reduction in false positives on visually similar, low-textured objects. We further demonstrate real-time deployment in a Unity-based MR application, highlighting the system's effectiveness in immersive astronomy-focused educational scenarios. Our results underscore the potential of synthetic data and multiscale attention to bridge accuracy, speed, and realism in next generation MR systems.