The spread of online misinformation has important effects on the stability of democracy. The information that is consumed every day influences human decision-making processes. The sheer size of digital content on the web and social media and the ability to immediately access and share it has made it difficult to perform timely fact-checking at scale. Indeed, fact-checking is a complex process that involves several activities. A long-term goal can be building a so-called human-in-the-loop system to cope with (mis)information by measuring truthfulness in real-time (e.g., as they appear on some social media, news outlets, and so on) using a combination of crowd-powered data, human intelligence, and machine learning techniques. In recent years, crowdsourcing has become a popular method for collecting to collect reliable truthfulness judgments in order to scale up and help study the manual fact-checking effort. Initially, this thesis investigates whether human judges can detect and objectively categorize online (mis)information and which is the environment that allows obtaining the best results. Then, the impact of cognitive biases on human assessors while judging information truthfulness is addressed. A categorization of cognitive biases is proposed together with countermeasures to combat their effects and a bias-aware judgment pipeline for fact-checking. Lastly, an approach able to predict information truthfulness and, at the same time, generate a natural language explanation supporting the prediction itself is proposed. The machine-generated explanations are evaluated to understand whether they are useful for the human assessors to better judge the truthfulness of information items. A collaborative process between systems, crowd workers, and expert fact checkers would provide a scalable and decentralized hybrid mechanism to cope with the increasing volume of online misinformation.

The spread of online misinformation has important effects on the stability of democracy. The sheer size of digital content on the web and social media and the ability to immediately access and share it has made it difficult to perform timely fact-checking at scale. Truthfulness judgments are usually made by experts, like journalists for political statements. A different approach can be relying on a (non-expert) crowd of human judges to perform fact-checking. This leads to the following research question: can such human judges detect and objectively categorize online (mis)information? Several extensive studies based on crowdsourcing are performed to answer. Thousands of truthfulness judgments over two datasets are collected by recruiting a crowd of workers from crowdsourcing platforms and the expert judgments are compared with the crowd ones. The results obtained allow for concluding that the workers are indeed able to do such. There is a limited understanding of factors that influence worker participation in longitudinal studies across different crowdsourcing marketplaces. A large-scale survey aimed at understanding how these studies are performed using crowdsourcing is run across multiple platforms. The answers collected are analyzed from both a quantitative and a qualitative point of view. A list of recommendations for task requesters to conduct these studies effectively is provided together with a list of best practices for crowdsourcing platforms. Truthfulness is a subtle matter: statements can be just biased, imprecise, wrong, etc. and a unidimensional truth scale cannot account for such differences. The crowd workers are asked to judge seven different dimensions of truthfulness selected based on existing literature. The newly collected crowdsourced judgments show that the workers are indeed reliable when compared to an expert-provided gold standard. Cognitive biases are human processes that often help minimize the cost of making mistakes but keep assessors away from an objective judgment of information. A review of the cognitive biases which might manifest during the fact-checking process is presented together with a list of countermeasures that can be adopted. An exploratory study on the previously collected data set is thus performed. The findings are used to formulate hypotheses concerning which individual characteristics of statements or judges and what cognitive biases may affect crowd workers' truthfulness judgments. The findings suggest that crowd workers' degree of belief in science has an impact, that they generally overestimate truthfulness, and that their judgments are indeed affected by various cognitive biases. Automated fact-checking systems to combat misinformation spreading exist, however, their complexity usually makes them opaque to the end user, making it difficult to foster trust in the system. The E-BART model is introduced with the hope of making progress on this front. E-BART can provide a truthfulness prediction for a statement, and jointly generate a human-readable explanation. An extensive human evaluation of the impact of explanations generated by the model is conducted, showing that the explanations increase the human ability to spot misinformation. The whole set of data collected and analyzed in this thesis is publicly released to the research community at: https://doi.org/10.17605/OSF.IO/JR6VC.

In Crowd Veritas: Leveraging Human Intelligence To Fight Misinformation / Michael Soprano , 2023 May 22. 35. ciclo, Anno Accademico 2021/2022.

In Crowd Veritas: Leveraging Human Intelligence To Fight Misinformation

SOPRANO, MICHAEL
2023-05-22

Abstract

The spread of online misinformation has important effects on the stability of democracy. The information that is consumed every day influences human decision-making processes. The sheer size of digital content on the web and social media and the ability to immediately access and share it has made it difficult to perform timely fact-checking at scale. Indeed, fact-checking is a complex process that involves several activities. A long-term goal can be building a so-called human-in-the-loop system to cope with (mis)information by measuring truthfulness in real-time (e.g., as they appear on some social media, news outlets, and so on) using a combination of crowd-powered data, human intelligence, and machine learning techniques. In recent years, crowdsourcing has become a popular method for collecting to collect reliable truthfulness judgments in order to scale up and help study the manual fact-checking effort. Initially, this thesis investigates whether human judges can detect and objectively categorize online (mis)information and which is the environment that allows obtaining the best results. Then, the impact of cognitive biases on human assessors while judging information truthfulness is addressed. A categorization of cognitive biases is proposed together with countermeasures to combat their effects and a bias-aware judgment pipeline for fact-checking. Lastly, an approach able to predict information truthfulness and, at the same time, generate a natural language explanation supporting the prediction itself is proposed. The machine-generated explanations are evaluated to understand whether they are useful for the human assessors to better judge the truthfulness of information items. A collaborative process between systems, crowd workers, and expert fact checkers would provide a scalable and decentralized hybrid mechanism to cope with the increasing volume of online misinformation.
22-mag-2023
The spread of online misinformation has important effects on the stability of democracy. The sheer size of digital content on the web and social media and the ability to immediately access and share it has made it difficult to perform timely fact-checking at scale. Truthfulness judgments are usually made by experts, like journalists for political statements. A different approach can be relying on a (non-expert) crowd of human judges to perform fact-checking. This leads to the following research question: can such human judges detect and objectively categorize online (mis)information? Several extensive studies based on crowdsourcing are performed to answer. Thousands of truthfulness judgments over two datasets are collected by recruiting a crowd of workers from crowdsourcing platforms and the expert judgments are compared with the crowd ones. The results obtained allow for concluding that the workers are indeed able to do such. There is a limited understanding of factors that influence worker participation in longitudinal studies across different crowdsourcing marketplaces. A large-scale survey aimed at understanding how these studies are performed using crowdsourcing is run across multiple platforms. The answers collected are analyzed from both a quantitative and a qualitative point of view. A list of recommendations for task requesters to conduct these studies effectively is provided together with a list of best practices for crowdsourcing platforms. Truthfulness is a subtle matter: statements can be just biased, imprecise, wrong, etc. and a unidimensional truth scale cannot account for such differences. The crowd workers are asked to judge seven different dimensions of truthfulness selected based on existing literature. The newly collected crowdsourced judgments show that the workers are indeed reliable when compared to an expert-provided gold standard. Cognitive biases are human processes that often help minimize the cost of making mistakes but keep assessors away from an objective judgment of information. A review of the cognitive biases which might manifest during the fact-checking process is presented together with a list of countermeasures that can be adopted. An exploratory study on the previously collected data set is thus performed. The findings are used to formulate hypotheses concerning which individual characteristics of statements or judges and what cognitive biases may affect crowd workers' truthfulness judgments. The findings suggest that crowd workers' degree of belief in science has an impact, that they generally overestimate truthfulness, and that their judgments are indeed affected by various cognitive biases. Automated fact-checking systems to combat misinformation spreading exist, however, their complexity usually makes them opaque to the end user, making it difficult to foster trust in the system. The E-BART model is introduced with the hope of making progress on this front. E-BART can provide a truthfulness prediction for a statement, and jointly generate a human-readable explanation. An extensive human evaluation of the impact of explanations generated by the model is conducted, showing that the explanations increase the human ability to spot misinformation. The whole set of data collected and analyzed in this thesis is publicly released to the research community at: https://doi.org/10.17605/OSF.IO/JR6VC.
crowdsourcing; misinformation; machine learning; cognitive bias
In Crowd Veritas: Leveraging Human Intelligence To Fight Misinformation / Michael Soprano , 2023 May 22. 35. ciclo, Anno Accademico 2021/2022.
File in questo prodotto:
File Dimensione Formato  
Tesi definitiva_SOPRANO.pdf

accesso aperto

Descrizione: Tesi
Licenza: Creative commons
Dimensione 10.06 MB
Formato Adobe PDF
10.06 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1252385
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact