A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units.
Li-Wei ChenShinji WatanabeAlexander RudnickyPublished in: CoRR (2022)
Keyphrases
- prosodic features
- synthesized speech
- speech synthesis
- audio visual
- speech recognition
- text to speech
- speaker verification
- vocal tract
- speaker recognition
- automatic speech recognition
- speaker identification
- multi stream
- spontaneous speech
- multi modal
- speech signal
- speaker dependent
- discrete version
- automatic speech recognition systems
- noisy environments
- speaker diarization
- structured light
- discrete geometry
- speaker adaptation
- language model
- multiscale
- discrete space
- emotion recognition
- speaker independent
- automatic transcription
- pattern recognition