Emotional FESTIVAL-MBROLA TTS synthesis

Tesser, F; Cosi, P; Drioli, Carlo; Tisato, G.

The topic of this work is an extension of our previous research on the development of a general data-driven procedure for creating a neutral "narrative-style" prosodic module for the Italian FESTIVAL Text-To-Speech (TTS) synthesizer, and it is focused on investigating and implementing new strategies for building a new emotional FESTIVAL TTS. The new emotional prosodic modules, similarly to the neutral case, are still based on the "Classification And Regression Tree" (CART) theory. The extension to the emotional speech synthesis is obtained using a differential approach: the emotional prosodic modules learn the differences between the neutral (without emotions) and the emotional prosodic data. Moreover, due to the fact that Voice Quality (VQ) is known to play an important role in emotive speech, a rule-based FESTIVAL-MBROLA VQ-modification module, for control of temporal and spectral characteristics of the synthesis, has also been implemented. Even if emotional synthesis still remains an attractive open issue, our preliminary evaluation results underline the effectiveness of the proposed solution.