A unified one-shot prosody and speaker conversion system with self-supervised discrete speech units.

Li-Wei Chen Shinji Watanabe Alexander Rudnicky

Published in: CoRR (2022)

Keyphrases

prosodic features
synthesized speech
speech synthesis
audio visual
speech recognition
text to speech
speaker verification
vocal tract
speaker recognition
automatic speech recognition
speaker identification
multi stream
spontaneous speech
multi modal
speech signal
speaker dependent
discrete version
automatic speech recognition systems
noisy environments
speaker diarization
structured light
discrete geometry
speaker adaptation
language model
multiscale
discrete space
emotion recognition
speaker independent
automatic transcription
pattern recognition