In the internet age, it has become unprecedentedly easy for people to contribute to the spread of news, over social media and other new means of self-expression. The demand for fast and accurate fact-checking has increased with respect to the past when information was more under the control of fewer subjects (newspapers, television, etc). As a possible solution, crowdsourcing in fact-checking relies on the masses themselves to rate the validity of claims. The task is related to information technology, so it has long been in the domain of machine learning and data mining. We aim to provide a probabilistic framework that can quantify the uncertainty in classification. We present a statistical perspective on the problem based on the two-parameter normal ogive model, which we borrow from item response theory (IRT). The ogive model disentangles worker-specific rating behaviour from the intrinsic truthfulness of claims. The Bayesian framework is especially suited for handling the latent variables that are needed to model workers' behaviour and claims' features. The method is necessarily supervised, as experts must provide samples of correct judgments. The proposal implies an aggregation of crowd judgements that surrogates the expert implicitly. The workers contributions are automatically weighted in differently, depending on their ability to emulate expert rating. Here, we illustrate the method based on a dataset of political claims, each rated by an expert along with crowd workers.
A Bayesian Two-Parameter Normal Ogive Model for Crowdsourced Fact-Checking
Michele Lambardi di San Miniato
Primo
;Michela BattauzSecondo
;Ruggero BellioPenultimo
;Paolo VidoniUltimo
2024-01-01
Abstract
In the internet age, it has become unprecedentedly easy for people to contribute to the spread of news, over social media and other new means of self-expression. The demand for fast and accurate fact-checking has increased with respect to the past when information was more under the control of fewer subjects (newspapers, television, etc). As a possible solution, crowdsourcing in fact-checking relies on the masses themselves to rate the validity of claims. The task is related to information technology, so it has long been in the domain of machine learning and data mining. We aim to provide a probabilistic framework that can quantify the uncertainty in classification. We present a statistical perspective on the problem based on the two-parameter normal ogive model, which we borrow from item response theory (IRT). The ogive model disentangles worker-specific rating behaviour from the intrinsic truthfulness of claims. The Bayesian framework is especially suited for handling the latent variables that are needed to model workers' behaviour and claims' features. The method is necessarily supervised, as experts must provide samples of correct judgments. The proposal implies an aggregation of crowd judgements that surrogates the expert implicitly. The workers contributions are automatically weighted in differently, depending on their ability to emulate expert rating. Here, we illustrate the method based on a dataset of political claims, each rated by an expert along with crowd workers.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.