nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-Shot Multi-speaker text-to-speech.
Botao ZhaoXulong ZhangJianzong WangNing ChengJing XiaoPublished in: ICASSP (2022)
Keyphrases
- prosodic features
- text to speech
- speech synthesis
- speaker verification
- speech recognition
- audio visual
- speaker recognition
- automatic speech recognition
- general purpose
- text to speech synthesis
- speaker diarization
- image segmentation
- english text
- visual speech
- spontaneous speech
- programming tool
- word processing
- acoustic features
- speaker identification
- noisy environments
- edge detection
- bayesian networks