Stylespeech: Self-Supervised Style Enhancing with VQ-VAE-Based Pre-Training for Expressive Audiobook Speech Synthesis.
Xueyuan ChenXi WangShaofei ZhangLei HeZhiyong WuXixin WuHelen MengPublished in: ICASSP (2024)
Keyphrases
- speech synthesis
- speech recognition
- vector quantization
- text to speech
- vocal tract
- training examples
- prosodic features
- training set
- image compression
- image coding
- neural network
- test set
- training algorithm
- training phase
- speech corpus
- training process
- training samples
- supervised learning
- pattern recognition
- support vector