DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training.
Hyung-Seok OhSang-Hoon LeeSeong-Whan LeePublished in: CoRR (2023)
Keyphrases
- speech synthesis
- text to speech
- speech recognition
- prosodic features
- vocal tract
- speech corpus
- multi agent
- online learning
- latent variables
- training process
- training set
- generation process
- diffusion process
- training phase
- supervised learning
- hidden markov models
- test set
- training samples
- random variables
- active learning
- training examples
- multiscale
- learning algorithm