Top Authors
Abstract
Modular object-centric representations are essential forhuman-like reasoningbut are challenging to obtain under spatial ambiguities,e.g. due to occlusions and view ambiguities. However, addressing challenges presents both theoretical and practical difficulties. We introduce a novel multi-view probabilistic approach that aggregates view-specific slots to captureinvariant contentinformation while simultaneously learning disentangled globalviewpoint-levelinformation. Unlike prior single-view methods, our approach resolves spatial ambiguities, provides theoretical guarantees for identifiability, and requiresno viewpoint annotations. Extensive experiments on standard benchmarks and novel complex datasets validate our method's robustness and scalability.