Login / Signup

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis.

Konstantinos KlapsasKarolos NikitarasNikolaos EllinasJune Sig SungInchul HwangSpyros RaptisAimilios ChalamandarisPirros Tsiakoulis
Published in: CoRR (2022)
Keyphrases
  • speech synthesis
  • speech recognition
  • text to speech
  • prosodic features
  • vocal tract
  • augmented reality
  • social networks
  • speech corpus
  • computer vision