Unifying One-Shot Voice Conversion and Cloning with Disentangled Speech Representations.
Hui LuXixin WuHaohan GuoSongxiang LiuZhiyong WuHelen MengPublished in: ICASSP (2024)
Keyphrases
- text to speech
- emotion recognition
- speech synthesis
- speech recognition
- speech recognition errors
- voice activity detection
- speech sounds
- speech quality
- speech signal
- audio visual
- fundamental frequency
- text to speech synthesis
- language acquisition
- spoken dialogue systems
- multi modal
- endpoint detection
- information systems
- prosodic features
- real time
- recognition engine
- vocal tract
- hidden markov models
- multimodal interaction
- spoken language
- higher level