StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis.
Xueyuan ChenXi WangShaofei ZhangLei HeZhiyong WuXixin WuHelen MengPublished in: CoRR (2023)
Keyphrases
- speech synthesis
- speech recognition
- vector quantization
- text to speech
- vocal tract
- prosodic features
- training examples
- image compression
- image coding
- data sets
- supervised learning
- training process
- training set
- online learning
- hidden markov models
- artificial neural networks
- training algorithm
- vector quantizer
- training phase
- multiscale
- neural network