The goal of this paper is to provide an overview of the methods that allow text representations with a focus on embeddings for text of different lengths, specifically on works that go beyond word embeddings. Analyzing pieces of text can be more challenging in comparison to the analysis of single words, because several additional factors come into play. For this reason, representations of longer pieces of text can be obtained with different strategies, leveraging additional information with respect to what is done for single words. A text is defined by its components and how these are combined together, and this should be taken into account when integrating information to obtain a single document embedding. In addition, multimodal approaches are described to show how it is possible to fuse information of different nature (aural, visual and knowledge) in order to obtain enriched representations. The aim of this survey is to help navigate through the existing methods proposed in the literature and understand which strategies are most suitable to specific needs.

Beyond word embeddings: A survey

Incitti F.
Primo
Writing – Original Draft Preparation
;
Snidaro L.
Ultimo
Writing – Review & Editing
2023-01-01

Abstract

The goal of this paper is to provide an overview of the methods that allow text representations with a focus on embeddings for text of different lengths, specifically on works that go beyond word embeddings. Analyzing pieces of text can be more challenging in comparison to the analysis of single words, because several additional factors come into play. For this reason, representations of longer pieces of text can be obtained with different strategies, leveraging additional information with respect to what is done for single words. A text is defined by its components and how these are combined together, and this should be taken into account when integrating information to obtain a single document embedding. In addition, multimodal approaches are described to show how it is possible to fuse information of different nature (aural, visual and knowledge) in order to obtain enriched representations. The aim of this survey is to help navigate through the existing methods proposed in the literature and understand which strategies are most suitable to specific needs.
File in questo prodotto:
File Dimensione Formato  
BWE.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Non pubblico
Dimensione 2.79 MB
Formato Adobe PDF
2.79 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1235306
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 10
  • ???jsp.display-item.citation.isi??? 6
social impact