A fundamental task for epidemiology, statistics, and health informatics is to associate some standardized meaning to textual expressions, to enable their retrieval, aggregation and interpretation. Among the relevant expressions, those mentioning health conditions and diagnoses are of paramount importance and can be found in almost any clinical document, including death certificates. These expressions are usually coded with the International Classification of Diseases. In this paper we employ both classical Machine Learning and BERT based models to perform the automatic classification of diagnostic texts extracted from death certificates. We show the effectiveness of our proposed approach over a set of experiments, where we experiment with multiple set of features and variant of the algorithms. Our results show that BERT based models, and in particular the ones pre-trained on the specific domain outperform classical ML algorithms, reaching Accuracy and F1-Score of respectively 0.952 and 0.943.

Automatic Assignment of ICD-10 Codes to Diagnostic Texts using Transformers Based Techniques

Popescu M. H.;Roitero K.;Della Mea V.
2021-01-01

Abstract

A fundamental task for epidemiology, statistics, and health informatics is to associate some standardized meaning to textual expressions, to enable their retrieval, aggregation and interpretation. Among the relevant expressions, those mentioning health conditions and diagnoses are of paramount importance and can be found in almost any clinical document, including death certificates. These expressions are usually coded with the International Classification of Diseases. In this paper we employ both classical Machine Learning and BERT based models to perform the automatic classification of diagnostic texts extracted from death certificates. We show the effectiveness of our proposed approach over a set of experiments, where we experiment with multiple set of features and variant of the algorithms. Our results show that BERT based models, and in particular the ones pre-trained on the specific domain outperform classical ML algorithms, reaching Accuracy and F1-Score of respectively 0.952 and 0.943.
2021
978-1-6654-0132-6
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1218642
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 2
social impact