The aim of this paper is to evaluate the effectiveness of a class of data-driven physical models to represent both acoustic and high-speed video data of the voice production process. Voice production analysis through numerical models of the phonation process is nowday a mature research field, and reliable dynamical glottal models of different accuracy and complexity are available. Although they are traditionally used to represent the acoustic emission during phonation, the biomechanical nature of the modeling makes them well suited to also represent high speed video recordings of the vocal folds oscillations. We discuss here a data-driven, numerically simulated model of the folds motion within an audio-video data analysis context. A model structure is proposed which is based on physical knowledge and data-driven machine learning components. A model inversion algorithm is designed that exploits acoustic data related to the glottal excitation and high speed video data of the folds, to estimate the parameters of the model and to represent the phonation characteristics. It is shown here how machine learning techniques can be effectively used in combination to biomechanical modeling, in order to fit match the osbserved data. The method is assessed on data from different subjects uttering sustained vowels

Data-driven vocal folds models for the representation of both acoustic and high speed video data

DRIOLI, Carlo;FORESTI, Gian Luca
2015-01-01

Abstract

The aim of this paper is to evaluate the effectiveness of a class of data-driven physical models to represent both acoustic and high-speed video data of the voice production process. Voice production analysis through numerical models of the phonation process is nowday a mature research field, and reliable dynamical glottal models of different accuracy and complexity are available. Although they are traditionally used to represent the acoustic emission during phonation, the biomechanical nature of the modeling makes them well suited to also represent high speed video recordings of the vocal folds oscillations. We discuss here a data-driven, numerically simulated model of the folds motion within an audio-video data analysis context. A model structure is proposed which is based on physical knowledge and data-driven machine learning components. A model inversion algorithm is designed that exploits acoustic data related to the glottal excitation and high speed video data of the folds, to estimate the parameters of the model and to represent the phonation characteristics. It is shown here how machine learning techniques can be effectively used in combination to biomechanical modeling, in order to fit match the osbserved data. The method is assessed on data from different subjects uttering sustained vowels
2015
9781479919604
File in questo prodotto:
File Dimensione Formato  
GlModel_AudVid_IJCNN15_R1.pdf

non disponibili

Tipologia: Documento in Pre-print
Licenza: Non pubblico
Dimensione 954.08 kB
Formato Adobe PDF
954.08 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/1071317
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact