The reproduction of voiced sounds by physical modeling is addressed. A major focus is put on the possibility of fitting a physically constrained model to real voice samples. A source-filter scheme is adopted in which the vocal tract is represented by an allpole filter and the voice source model relies on a lumped mechano aerodynamic scheme inspired by the mass-spring paradigm. The vocal folds are represented by a mechanical resonator plus a delay line which takes into account the vertical phase differences. The vocal fold displacement is coupled to the glottal flow by means of a general parametric nonlinear model. An adaptive data-driven identification procedure is outlined, where the parameters of the model are tuned in order to accurately match the target speech waveform. The simultaneous optimization of the source and the vocal tract parameters is discussed. A recursive algorithm based on the Kalman filtering approach is proposed and evaluated. The performance on time varying voiced signals is discussed.

Synthesis of voiced sounds by means of waveform adaptive physical models

DRIOLI, Carlo
2003-01-01

Abstract

The reproduction of voiced sounds by physical modeling is addressed. A major focus is put on the possibility of fitting a physically constrained model to real voice samples. A source-filter scheme is adopted in which the vocal tract is represented by an allpole filter and the voice source model relies on a lumped mechano aerodynamic scheme inspired by the mass-spring paradigm. The vocal folds are represented by a mechanical resonator plus a delay line which takes into account the vertical phase differences. The vocal fold displacement is coupled to the glottal flow by means of a general parametric nonlinear model. An adaptive data-driven identification procedure is outlined, where the parameters of the model are tuned in order to accurately match the target speech waveform. The simultaneous optimization of the source and the vocal tract parameters is discussed. A recursive algorithm based on the Kalman filtering approach is proposed and evaluated. The performance on time varying voiced signals is discussed.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11390/682880
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact