Voice GMM modelling of voice quality for FESTIVAL/MBROLA emotive TTS synthesis (Contributo in atti di convegno)

Type
Label
  • Voice GMM modelling of voice quality for FESTIVAL/MBROLA emotive TTS synthesis (Contributo in atti di convegno) (literal)
Anno
  • 2006-01-01T00:00:00+01:00 (literal)
Alternative label
  • Mauro Nicolao; Carlo Drioli; Piero Cosi (2006)
    Voice GMM modelling of voice quality for FESTIVAL/MBROLA emotive TTS synthesis
    in Interspeech 2006 -- ICSLP - 9th International Conference on Spoken Language Processing, Pittsburgh, PA, USA, 17-21, Settembre 2006
    (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#autori
  • Mauro Nicolao; Carlo Drioli; Piero Cosi (literal)
Pagina inizio
  • 1794 (literal)
Pagina fine
  • 1797 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#altreInformazioni
  • Articolo in Atti di Convegno ISI (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#url
  • http://www.isca-speech.org/archive/interspeech_2006/i06_1597.html (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#titoloVolume
  • 9th International Conference on Spoken Language Processing (Interspeech 2006 -- ICSLP) (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#volumeInCollana
  • 1/2006 (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#pagineTotali
  • 4 (literal)
Note
  • Scopu (literal)
  • ISI Web of Science (WOS) (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#affiliazioni
  • Istituto di Scienze e Tecnologie della Cognizione - Sede di Padova \"Fonetica e Dialettologia\" Consiglio Nazionale delle Ricerche, Via G. Anghinoni, 10 - 35121 Padova, Italy (literal)
Titolo
  • Voice GMM modelling of voice quality for FESTIVAL/MBROLA emotive TTS synthesis (literal)
Http://www.cnr.it/ontology/cnr/pubblicazioni.owl#isbn
  • 978-1-60423-449-7 (literal)
Abstract
  • Voice quality is recognized to play an important role for the rendering of emotions in verbal communication. In this paper we explore the effectiveness of a processing framework for voice transformations finalized to the analysis and synthesis of emotive speech. We use a GMM-based model to compute the differences between an MBROLA voice and an anger voice, and we address the modification of the MBROLA voice spectra by using a set of spectral conversion functions trained on the data. We propose to organize the speech data for the training in such way that the target emotive speech data and the diphone database used for the text-to-speech synthesis, both come from the same speaker. A copy-synthesis procedure is used to produce synthesis speech utterances where pitch patterns, phoneme duration, and principal speaker characteristics are the same as in the target emotive utterances. This results in a better isolation of the voice quality differences due to the emotive arousal. Three different models to represent voice quality differences are applied and compared. The models are all based on a GMM representation of the acoustic space. The performance of these models is discussed and the experimental results and assessment are presented. (literal)
Editore
Prodotto di
Autore CNR
Insieme di parole chiave

Incoming links:


Autore CNR di
Prodotto
Editore di
Insieme di parole chiave di
data.CNR.it