Content-Dependent Fine-Grained Speaker Embedding for Zero-Shot Speaker Adaptation in Text-to-Speech Synthesis.
Yixuan ZhouChanghe SongXiang LiLuwen ZhangZhiyong WuYanyao BianDan SuHelen MengPublished in: CoRR (2022)
Keyphrases
- fine grained
- speaker adaptation
- text to speech synthesis
- speech recognition
- coarse grained
- speaker dependent
- maximum likelihood
- automatic speech recognition
- access control
- vector space
- multimedia
- user intent
- speech recognizer
- text to speech
- speaker independent
- data lineage
- language model
- hidden markov models
- semantic information
- neural network
- feature vectors
- pattern recognition
- speaker identification
- speaker verification
- search engine
- information retrieval