Short texts, due to their nature which makes them full of abbreviations and new coined acronyms, are not easy to classify. Text enrichment is emerging in the literature as a potentially useful tool. This paper is a part of a longer term research that aims at understanding the effectiveness of tweet enrichment by means of news, instead of the whole web as a knowledge source. Since the choice of a news collection may contribute to produce very different outcomes in the enrichment process, we compare the impact of three features of such collections: volume, variety, and freshness. We show that all three features have a significant impact on categorization accuracy. Copyright © 2016 for the individual papers by the paper's authors.
Exploiting news to categorize tweets: Quantifying the impact of different news collections
PAVAN, Marco;MIZZARO, Stefano;SCAGNETTO, Ivan
2016-01-01
Abstract
Short texts, due to their nature which makes them full of abbreviations and new coined acronyms, are not easy to classify. Text enrichment is emerging in the literature as a potentially useful tool. This paper is a part of a longer term research that aims at understanding the effectiveness of tweet enrichment by means of news, instead of the whole web as a knowledge source. Since the choice of a news collection may contribute to produce very different outcomes in the enrichment process, we compare the impact of three features of such collections: volume, variety, and freshness. We show that all three features have a significant impact on categorization accuracy. Copyright © 2016 for the individual papers by the paper's authors.File | Dimensione | Formato | |
---|---|---|---|
newsIR16.pdf
accesso aperto
Descrizione: Articolo principale
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
492.2 kB
Formato
Adobe PDF
|
492.2 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.