nnSpeech: Speaker-Guided Conditional Variational Autoencoder for Zero-shot Multi-speaker Text-to-Speech.
Botao ZhaoXulong ZhangJianzong WangNing ChengJing XiaoPublished in: CoRR (2022)
Keyphrases
- prosodic features
- text to speech
- speech synthesis
- speaker verification
- speech recognition
- speaker recognition
- audio visual
- text to speech synthesis
- word processing
- high level
- speaker identification
- noisy environments
- variational methods
- multi modal
- optical flow
- speaker diarization
- random field model
- spontaneous speech
- image segmentation
- computer vision