HierSpeech: Bridging the Gap between Text and Speech by Hierarchical Variational Inference using Self-supervised Representations for Speech Synthesis.
Sang-Hoon LeeSeung-Bin KimJi-Hyun LeeEunwoo SongMin-Jae HwangSeong-Whan LeePublished in: NeurIPS (2022)
Keyphrases
- speech synthesis
- text to speech
- variational inference
- speech recognition
- bayesian inference
- vocal tract
- prosodic features
- topic models
- posterior distribution
- probabilistic model
- probabilistic graphical models
- gaussian process
- variational methods
- latent dirichlet allocation
- closed form
- mixture model
- information retrieval
- text mining
- exact inference
- exponential family
- pattern recognition
- language model
- word processing
- maximum likelihood
- markov networks
- approximate inference
- coarse to fine