The Metaverse is growing rapidly, resulting in thousands of rich virtual universes. This results in a difficult search process for the user, making advanced search tools a necessity. Existing methods leverage contrastive learning to obtain a function mapping a 3D scene and its textual descriptions into similar representations. However, Metaverse scenarios are complex, multimedia-rich 3D scenes containing many elements, making cross-modal alignment difficult. For instance, a museum dedicated to Van Gogh is unrelated to Warhol, yet it shares similarities with Matisse or Monet. To make the mapping functions aware of these nuances, we propose a novel learning strategy to integrate Adaptive Optimization Constraints, computing data-dependent distances using a language-based method we design and enforcing them between the representations at training time. This novelty sets our approach apart from standard procedures enforcing the same distance. We validate the effectiveness of two datasets, one including 6000 apartments, and a novel dataset of 3000 museums that we collect. We observe consistent improvements compared to existing methods. Moreover, we obtain better generalization when with very complex scenarios, e.g. on the museums dataset it obtains an average R@1 of 5.2% compared to 1.2% obtained by existing methods. Finally, the source code is available at https://github.com/aliabdari/ALCER3D.
ALCER3D: Adaptive Learning Constraints for Enhanced Retrieval of Complex Indoor 3D Scenarios
Falcon A.;Serra G.
2025-01-01
Abstract
The Metaverse is growing rapidly, resulting in thousands of rich virtual universes. This results in a difficult search process for the user, making advanced search tools a necessity. Existing methods leverage contrastive learning to obtain a function mapping a 3D scene and its textual descriptions into similar representations. However, Metaverse scenarios are complex, multimedia-rich 3D scenes containing many elements, making cross-modal alignment difficult. For instance, a museum dedicated to Van Gogh is unrelated to Warhol, yet it shares similarities with Matisse or Monet. To make the mapping functions aware of these nuances, we propose a novel learning strategy to integrate Adaptive Optimization Constraints, computing data-dependent distances using a language-based method we design and enforcing them between the representations at training time. This novelty sets our approach apart from standard procedures enforcing the same distance. We validate the effectiveness of two datasets, one including 6000 apartments, and a novel dataset of 3000 museums that we collect. We observe consistent improvements compared to existing methods. Moreover, we obtain better generalization when with very complex scenarios, e.g. on the museums dataset it obtains an average R@1 of 5.2% compared to 1.2% obtained by existing methods. Finally, the source code is available at https://github.com/aliabdari/ALCER3D.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


