Recent advancements in Artificial Intelligence have led to several breakthroughs in many heterogeneous scientific fields, such as the prediction of protein structures or self-driving cars. These results are obtained by means of Machine Learning techniques, which make it possible to automatically learn from the available annotated examples a mathematical model capable of solving the task. One of its sub-fields, Deep Learning, brought further improvements by providing the possibility to also compute an informative and non-redundant representation for each example by means of the same learning process. To successfully solve the task under analysis, the model needs to overcome the generalization gap, meaning that it needs to work well both on the training data, and on examples which are drawn from the same distribution but are never observed at training time. Several heuristics are often used to overcome this gap, such as the introduction of inductive biases when modeling the data or the usage of regularization techniques; however, a popular way consists in collecting and annotating more examples hoping they can cover the cases which were not previously observed. In particular, recent state-of-the-art solutions use hundreds of millions or even billions of annotated examples, and the underlying trend seems to imply that the collection and annotation of more and more examples should be the prominent way to overcome the generalization gap. However, there are many fields, e.g. medical fields, in which it is difficult to collect such a large amount of examples, and producing high quality annotations is even more arduous and costly. During my Ph.D. and in this thesis, I designed and proposed several solutions which address the generalization gap in three different domains by leveraging semantic aspects of the available data. In particular, the first part of the thesis includes techniques which create new annotations for the data under analysis: these include data augmentation techniques, which are used to compute variations of the annotations by means of semantics-preserving transformations, and transfer learning, which is used in the scope of this thesis to automatically generate textual descriptions for a set of images. In the second part of the thesis, this gap is reduced by customizing the training objective based on the semantics of the annotations. By means of these customizations, a problem is shifted from the commonly used single-task setting to a multi-task learning setting by designing an additional task, and then two variations of a standard loss function are proposed by introducing semantic knowledge into the training process.

Semantics for vision-and-language understanding / Alex Falcon , 2023 Mar 13. 35. ciclo, Anno Accademico 2021/2022.

Semantics for vision-and-language understanding

FALCON, ALEX
2023-03-13

Abstract

Recent advancements in Artificial Intelligence have led to several breakthroughs in many heterogeneous scientific fields, such as the prediction of protein structures or self-driving cars. These results are obtained by means of Machine Learning techniques, which make it possible to automatically learn from the available annotated examples a mathematical model capable of solving the task. One of its sub-fields, Deep Learning, brought further improvements by providing the possibility to also compute an informative and non-redundant representation for each example by means of the same learning process. To successfully solve the task under analysis, the model needs to overcome the generalization gap, meaning that it needs to work well both on the training data, and on examples which are drawn from the same distribution but are never observed at training time. Several heuristics are often used to overcome this gap, such as the introduction of inductive biases when modeling the data or the usage of regularization techniques; however, a popular way consists in collecting and annotating more examples hoping they can cover the cases which were not previously observed. In particular, recent state-of-the-art solutions use hundreds of millions or even billions of annotated examples, and the underlying trend seems to imply that the collection and annotation of more and more examples should be the prominent way to overcome the generalization gap. However, there are many fields, e.g. medical fields, in which it is difficult to collect such a large amount of examples, and producing high quality annotations is even more arduous and costly. During my Ph.D. and in this thesis, I designed and proposed several solutions which address the generalization gap in three different domains by leveraging semantic aspects of the available data. In particular, the first part of the thesis includes techniques which create new annotations for the data under analysis: these include data augmentation techniques, which are used to compute variations of the annotations by means of semantics-preserving transformations, and transfer learning, which is used in the scope of this thesis to automatically generate textual descriptions for a set of images. In the second part of the thesis, this gap is reduced by customizing the training objective based on the semantics of the annotations. By means of these customizations, a problem is shifted from the commonly used single-task setting to a multi-task learning setting by designing an additional task, and then two variations of a standard loss function are proposed by introducing semantic knowledge into the training process.
13-mar-2023
video and language; deep learning; multimedia
Semantics for vision-and-language understanding / Alex Falcon , 2023 Mar 13. 35. ciclo, Anno Accademico 2021/2022.
File in questo prodotto:
File Dimensione Formato  
PhD_thesis-4.pdf

accesso aperto

Descrizione: Tesi (versione post minor-revision)
Licenza: Creative commons
Dimensione 14.27 MB
Formato Adobe PDF
14.27 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1252364
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact