Spoken Content and Voice Factorization for Few-Shot Speaker Adaptation.
Tao WangJianhua TaoRuibo FuJiangyan YiZhengqi WenRongxiu ZhongPublished in: INTERSPEECH (2020)
Keyphrases
- speaker adaptation
- speech recognition
- maximum likelihood
- automatic speech recognition
- video sequences
- video data
- speaker dependent
- video shots
- video content
- key frames
- speech recognizer
- speaker independent
- speech synthesis
- machine learning
- visual features
- non stationary
- visual information
- hidden markov models
- image retrieval