Semantic segmentation models have shown impressive performance in the context of historical document layout analysis, but their effectiveness is reliant on having access to a large number of high-quality annotated images for training. A popular approach to address the lack of training data in other domains is to rely on transfer learning to transfer the knowledge learned from a large-scale, general-purpose dataset (e.g. ImageNet) to a domain-specific task. However, this approach has been shown to lead to unsatisfactory results when the target task is completely unrelated to the data employed for the pre-training process, which is the case when working on document layout analysis. For this reason, in the present paper, we provide an overview of domain-specific transfer learning for document layout segmentation. In particular, we show how relying on document-related images for the pre-training process leads to consistently improved performance and faster convergence compared to training from scratch or even relying on a large, general purpose, dataset such as ImageNet.
Is ImageNet Always the Best Option? An Overview on Transfer Learning Strategies for Document Layout Analysis
De Nardin A.
;Zottin S.;Colombi E.;Piciarelli C.;Foresti G. L.
2024-01-01
Abstract
Semantic segmentation models have shown impressive performance in the context of historical document layout analysis, but their effectiveness is reliant on having access to a large number of high-quality annotated images for training. A popular approach to address the lack of training data in other domains is to rely on transfer learning to transfer the knowledge learned from a large-scale, general-purpose dataset (e.g. ImageNet) to a domain-specific task. However, this approach has been shown to lead to unsatisfactory results when the target task is completely unrelated to the data employed for the pre-training process, which is the case when working on document layout analysis. For this reason, in the present paper, we provide an overview of domain-specific transfer learning for document layout segmentation. In particular, we show how relying on document-related images for the pre-training process leads to consistently improved performance and faster convergence compared to training from scratch or even relying on a large, general purpose, dataset such as ImageNet.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.