The widespread diffusion of mobile devices, e.g., smartphones and tablets, has made possible a huge increment in data generation by users. Nowadays, about a billion users daily interact on online social media, where they share information and discuss about a wide variety of topics, sometimes including the places they visit. Furthermore, the use of mobile devices makes available a large amount of data tracked by integrated sensors, which monitor several users’ activities, again including their position. The content produced by users are composed of few elements, such as only some words in a social post, or a simple GPS position, therefore a poor source of information to analyze. On this basis, a data enrichment process may provide additional knowledge by exploiting other related sources to extract additional data. The aim of this dissertation is to analyze the effectiveness of data enrichment for categorization, in particular on two domains, short texts and user movements. We de- scribe the concept behind our experimental design where users’ content are represented as abstract objects in a geometric space, with distances representing relatedness and similarity values, and contexts representing regions close to the each object where it is possibile to find other related objects, and therefore suitable as data enrichment source. Regarding short texts our research involves a novel approach on short text enrichment and categorization, and an extensive study on the properties of data used as enrich- ment. We analyze the temporal context and a set of properties which characterize data from an external source in order to properly select and extract additional knowledge related to textual content that users produce. We use Twitter as short texts source to build datasets for all experiments. Regarding user movements we address the problem of places categorization recognizing important locations that users visit frequently and intensively. We propose a novel approach on places categorization based on a feature space which models the users’ movement habits. We analyze both temporal and spa- tial context to find additional information to use as data enrichment and improve the importance recognition process. We use an in-house built dataset of GPS logs and the GeoLife public dataset for our experiments. Experimental evaluations on both our stud- ies highlight how the enrichment phase has a considerable impact on each process, and the results demonstrate its effectiveness. In particular, the short texts analysis shows how news articles are documents particularly suitable to be used as enrichment source, and their freshness is an important property to consider. User Movements analysis demonstrates how the context with additional data helps, even with user trajectories difficult to analyze. Finally, we provide an early stage study on user modeling. We exploit the data extracted with enrichment on the short texts to build a richer user profile. The enrichment phase, combined with a network-based approach, improves the profiling process providing higher scores in similarity computation where expected

Effectiveness of Data Enrichment on Categorization: Two Case Studies on Short Texts and User Movements - Udine. , 2017 Apr 03. 28. ciclo

Effectiveness of Data Enrichment on Categorization: Two Case Studies on Short Texts and User Movements

Pavan, Marco
2017-04-03

Abstract

The widespread diffusion of mobile devices, e.g., smartphones and tablets, has made possible a huge increment in data generation by users. Nowadays, about a billion users daily interact on online social media, where they share information and discuss about a wide variety of topics, sometimes including the places they visit. Furthermore, the use of mobile devices makes available a large amount of data tracked by integrated sensors, which monitor several users’ activities, again including their position. The content produced by users are composed of few elements, such as only some words in a social post, or a simple GPS position, therefore a poor source of information to analyze. On this basis, a data enrichment process may provide additional knowledge by exploiting other related sources to extract additional data. The aim of this dissertation is to analyze the effectiveness of data enrichment for categorization, in particular on two domains, short texts and user movements. We de- scribe the concept behind our experimental design where users’ content are represented as abstract objects in a geometric space, with distances representing relatedness and similarity values, and contexts representing regions close to the each object where it is possibile to find other related objects, and therefore suitable as data enrichment source. Regarding short texts our research involves a novel approach on short text enrichment and categorization, and an extensive study on the properties of data used as enrich- ment. We analyze the temporal context and a set of properties which characterize data from an external source in order to properly select and extract additional knowledge related to textual content that users produce. We use Twitter as short texts source to build datasets for all experiments. Regarding user movements we address the problem of places categorization recognizing important locations that users visit frequently and intensively. We propose a novel approach on places categorization based on a feature space which models the users’ movement habits. We analyze both temporal and spa- tial context to find additional information to use as data enrichment and improve the importance recognition process. We use an in-house built dataset of GPS logs and the GeoLife public dataset for our experiments. Experimental evaluations on both our stud- ies highlight how the enrichment phase has a considerable impact on each process, and the results demonstrate its effectiveness. In particular, the short texts analysis shows how news articles are documents particularly suitable to be used as enrichment source, and their freshness is an important property to consider. User Movements analysis demonstrates how the context with additional data helps, even with user trajectories difficult to analyze. Finally, we provide an early stage study on user modeling. We exploit the data extracted with enrichment on the short texts to build a richer user profile. The enrichment phase, combined with a network-based approach, improves the profiling process providing higher scores in similarity computation where expected
3-apr-2017
Enrichment; Categorization; Short texts; User movements; User modeling
Effectiveness of Data Enrichment on Categorization: Two Case Studies on Short Texts and User Movements - Udine. , 2017 Apr 03. 28. ciclo
File in questo prodotto:
File Dimensione Formato  
10990_819_Pavan_PhD_thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Non specificato
Dimensione 2.74 MB
Formato Adobe PDF
2.74 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1132153
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact