Re-identifying handwriting in ancient manuscripts is a complex open-set task, complicated by style variability, material degradation, writing tool inconsistencies, and the scarcity of large annotated datasets. Traditional writer identification methods often struggle with generalization in historical contexts, where handwriting can vary significantly due to temporal, material, or contextual changes, even across documents produced by the same scribe. To address this, we propose a novel design strategy, Shape-Aware Geometric Embeddings (SAGE), which enhances writer re-identification by enforcing shape consistency, and refer to architectures integrated with this module as SAGE-Networks. The SAGE module integrates a Siamese architecture with a shape-consistency mechanism that emphasizes geometric structure in the input images. A self-supervised loss enforces alignment between the Siamese and shape-aware embeddings, guiding the model to extract invariant structural features across writing samples. This approach improves robustness to stylistic drift, ink variation, and physical degradation, which are common challenges in historical corpora. To support this task, we also introduce a novel benchmark dataset, Scribe Re-ID20, comprising manuscript excerpts from 20 distinct writers spanning several centuries. The dataset includes probe-gallery pairs specifically designed for open-set re-identification. Experimental results show that SAGE-Networks achieve up to a 21.6% absolute improvement in Rank-1 accuracy over a standard Siamese baseline without shape guidance. These findings underscore the critical role of geometric priors in handwriting modeling and demonstrate that our shape-aware embedding strategy significantly improves performance in challenging historical settings. SAGE establishes a strong and extensible foundation for future research in writer re-identification and document image analysis.
SAGE-networks: Shape-aware geometric embeddings for writer re-identification in historical manuscripts
Colombi E.;Foresti G. L.
2026-01-01
Abstract
Re-identifying handwriting in ancient manuscripts is a complex open-set task, complicated by style variability, material degradation, writing tool inconsistencies, and the scarcity of large annotated datasets. Traditional writer identification methods often struggle with generalization in historical contexts, where handwriting can vary significantly due to temporal, material, or contextual changes, even across documents produced by the same scribe. To address this, we propose a novel design strategy, Shape-Aware Geometric Embeddings (SAGE), which enhances writer re-identification by enforcing shape consistency, and refer to architectures integrated with this module as SAGE-Networks. The SAGE module integrates a Siamese architecture with a shape-consistency mechanism that emphasizes geometric structure in the input images. A self-supervised loss enforces alignment between the Siamese and shape-aware embeddings, guiding the model to extract invariant structural features across writing samples. This approach improves robustness to stylistic drift, ink variation, and physical degradation, which are common challenges in historical corpora. To support this task, we also introduce a novel benchmark dataset, Scribe Re-ID20, comprising manuscript excerpts from 20 distinct writers spanning several centuries. The dataset includes probe-gallery pairs specifically designed for open-set re-identification. Experimental results show that SAGE-Networks achieve up to a 21.6% absolute improvement in Rank-1 accuracy over a standard Siamese baseline without shape guidance. These findings underscore the critical role of geometric priors in handwriting modeling and demonstrate that our shape-aware embedding strategy significantly improves performance in challenging historical settings. SAGE establishes a strong and extensible foundation for future research in writer re-identification and document image analysis.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


