We address the problem of the categorization of short texts, like those posted by users on social networks and microblogging platforms. We specifically focus on Twitter. Since short texts do not provide sufficient word occurrences, and they often contain abbreviations and acronyms, traditional classification methods such as "Bag-of-Words" have limitations. Our proposed method enriches the original text with a new set of words, to add more semantic value by using information extracted from webpages of the same temporal context. Then we use those words to query Wikipedia, as an external knowledge base, with the final goal to categorize the original text using a predefined set of Wikipedia categories. We also present a first experimental evaluation that confirms the effectiveness of the algorithm design and implementation choices, highlighting some critical issues with short texts.

Short Text Categorization Exploiting Contextual Enrichment and External Knowledge

MIZZARO, Stefano;PAVAN, Marco;SCAGNETTO, Ivan;
2014-01-01

Abstract

We address the problem of the categorization of short texts, like those posted by users on social networks and microblogging platforms. We specifically focus on Twitter. Since short texts do not provide sufficient word occurrences, and they often contain abbreviations and acronyms, traditional classification methods such as "Bag-of-Words" have limitations. Our proposed method enriches the original text with a new set of words, to add more semantic value by using information extracted from webpages of the same temporal context. Then we use those words to query Wikipedia, as an external knowledge base, with the final goal to categorize the original text using a predefined set of Wikipedia categories. We also present a first experimental evaluation that confirms the effectiveness of the algorithm design and implementation choices, highlighting some critical issues with short texts.
2014
9781450330220
File in questo prodotto:
File Dimensione Formato  
p57-mizzaro.pdf

non disponibili

Tipologia: Documento in Post-print
Licenza: Non pubblico
Dimensione 658.6 kB
Formato Adobe PDF
658.6 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1036349
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? ND
social impact