Towards a Vowel Formant Based Quality Metric for Text-to-Speech Systems: Measuring Monophthong Naturalness.

Sven AlbrechtRewa TamboliStefan TaubertMaximilian EiblGünter Daniel ReyJosef Schmied
Published in: CIVEMSA (2022)
Keyphrases
  • text to speech
  • speech synthesis
  • prosodic features
  • quality metrics
  • quality assessment
  • image retrieval
  • post processing
  • visual quality
  • feature extraction
  • image quality
  • retrieval systems
  • structural similarity