We provide a uniform, general, and complete formal account of evaluation metrics for ranking, classification, clustering, and other information access problems. We leverage concepts from measurement theory, such as scale types and permissible transformation functions, and we capture the nature of evaluation metrics in many tasks by two formal definitions, which lead to a distinction of two metric/tasks families, and provide a comprehensive classification of the tasks that have been proposed so far. We derive some theorems to analyze the suitability (or otherwise) of some common metrics. Within our model we can derive and explain the theoretical properties and drawbacks of the state of the art metrics for multiple tasks. The main contributions of this paper are that, differently from previous studies, the formalization is well grounded on a solid discipline, it is general as it can take into account most effectiveness metrics as well as most existing tasks, and it allows to derive important consequences on metrics and their limitations.

On the nature of information access evaluation metrics: a unifying framework

Mizzaro S.
2020-01-01

Abstract

We provide a uniform, general, and complete formal account of evaluation metrics for ranking, classification, clustering, and other information access problems. We leverage concepts from measurement theory, such as scale types and permissible transformation functions, and we capture the nature of evaluation metrics in many tasks by two formal definitions, which lead to a distinction of two metric/tasks families, and provide a comprehensive classification of the tasks that have been proposed so far. We derive some theorems to analyze the suitability (or otherwise) of some common metrics. Within our model we can derive and explain the theoretical properties and drawbacks of the state of the art metrics for multiple tasks. The main contributions of this paper are that, differently from previous studies, the formalization is well grounded on a solid discipline, it is general as it can take into account most effectiveness metrics as well as most existing tasks, and it allows to derive important consequences on metrics and their limitations.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1186131
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 3
social impact