DiffProsody: Diffusion-Based Latent Prosody Generation for Expressive Speech Synthesis With Prosody Conditional Adversarial Training.
Hyung-Seok OhSang-Hoon LeeSeong-Whan LeePublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2024)
Keyphrases
- speech synthesis
- text to speech
- speech recognition
- prosodic features
- vocal tract
- random field model
- training phase
- hidden markov models
- generation process
- anisotropic diffusion
- training process
- diffusion process
- supervised learning
- probabilistic model
- training set
- training algorithm
- feature extraction
- diffusion model
- pattern recognition
- support vector
- multi agent