The agreement between relevance assessors is an important but understudied topic in the Information Retrieval literature because of the limited data available about documents assessed by multiple judges. This issue has gained even more importance recently in light of crowdsourced relevance judgments, where it is customary to gather many relevance labels for each topic-document pair. In a crowdsourcing setting, agreement is often even used as a proxy for quality, although without any systematic verification of the conjecture that higher agreement corresponds to higher quality. In this paper we address this issue and we study in particular: the effect of topic on assessor agreement; the relationship between assessor agreement and judgment quality; the effect of agreement on ranking systems according to their effectiveness; and the definition of an agreement-aware effectiveness metric that does not discard information about multiple judgments for the same document as it typically happens in a crowdsourcing setting. © 2017 Copyright held by the owner/author(s).

Considering assessor agreement in IR evaluation

Maddalena, Eddy
;
Roitero, Kevin
;
Mizzaro, Stefano
2017-01-01

Abstract

The agreement between relevance assessors is an important but understudied topic in the Information Retrieval literature because of the limited data available about documents assessed by multiple judges. This issue has gained even more importance recently in light of crowdsourced relevance judgments, where it is customary to gather many relevance labels for each topic-document pair. In a crowdsourcing setting, agreement is often even used as a proxy for quality, although without any systematic verification of the conjecture that higher agreement corresponds to higher quality. In this paper we address this issue and we study in particular: the effect of topic on assessor agreement; the relationship between assessor agreement and judgment quality; the effect of agreement on ranking systems according to their effectiveness; and the definition of an agreement-aware effectiveness metric that does not discard information about multiple judgments for the same document as it typically happens in a crowdsourcing setting. © 2017 Copyright held by the owner/author(s).
2017
9781450344906
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1126642
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 14
social impact