Combining image and point cloud segmentation to improve heritage understanding

Bassier, M.; Mazzacca, G.; Battisti, R.; Malek, S.; Remondino, F.

doi:10.5194/isprs-Archives-XLVIII-2-W4-2024-49-2024

Current 2D and 3D semantic segmentation frameworks are developed and trained on specific benchmark datasets, often rich of synthetic data, and when they are applied to complex and real-world heritage scenarios they offer much lower accuracy than expected. In this work, we present and demonstrate an early and late fusion of methods for semantic segmentation in cultural heritage applications. We rely on image datasets, point clouds and BIM models. The early fusion utilizes multi-view rendering to generate RGBD imagery of the scene. In contrast, the late fusion approach merges image-based segmentation with a Point Transformer applied to point clouds. Two scenarios are considered and inference results show that predictions are primarily influenced by whether the scene has a predominantly geometric or texture-based signature, underscoring the necessity of fusion methods.