Prosody and Voice Factorization for Few-Shot Speaker Adaptation in the Challenge M2voc 2021.
Tao WangRuibo FuJiangyan YiJianhua TaoZhengqi WenChunyu QiangShiming WangPublished in: ICASSP (2021)
Keyphrases
- text to speech
- speaker adaptation
- visual object classes
- speech recognition
- speech synthesis
- synthesized speech
- video data
- speaker dependent
- speaker independent
- key frames
- image processing
- temporal segmentation
- video shots
- video content
- maximum likelihood
- hidden markov models
- neural network
- automatic speech recognition
- vocal tract
- prosodic features
- natural language processing
- video sequences