Accent and Speaker Disentanglement in Many-to-many Voice Conversion.
Zhichao WangWenshuo GeXiong WangShan YangWendong GanHaitao ChenHai LiLei XieXiulin LiPublished in: ISCSLP (2021)
Keyphrases
- speech recognition
- automatic speech recognition
- prosodic features
- speech synthesis
- synthesized speech
- speech sounds
- mel frequency cepstral coefficients
- voice activity detection
- speaker identification
- speaker verification
- text to speech
- speech signal
- speaker recognition
- spoken language
- noisy environments
- language model
- speaker dependent
- audio visual
- emotion recognition
- hidden markov models
- speaker diarization
- fundamental frequency
- feature space
- database
- broadcast news
- vocal tract
- speech quality
- multi modal
- video sequences
- image sequences
- voice and data services
- information systems