Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system.
Sheng-Yao WangYi-Chin HuangPublished in: ROCLING (2021)
Keyphrases
- speech synthesis
- prosodic features
- speech recognition
- vocal tract
- text to speech
- speaker verification
- speaker identification
- hidden markov models
- audio visual
- similarity measure
- rank order
- computer vision
- complex networks
- speaker recognition
- user profiles
- speaker diarization
- speech corpus
- individual user
- multidimensional scaling
- noisy environments
- automatic speech recognition
- speech signal
- network model
- network structure
- vector space
- peer to peer
- language model
- wireless sensor networks
- pattern recognition
- noise reduction
- high dimensional
- image processing