A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units.
Li-Wei ChenShinji WatanabeAlexander RudnickyPublished in: ICASSP (2023)
Keyphrases
- prosodic features
- synthesized speech
- speech synthesis
- audio visual
- text to speech
- speech recognition
- speaker verification
- vocal tract
- speaker recognition
- automatic speech recognition
- spontaneous speech
- multi stream
- speaker identification
- speech signal
- multi modal
- speaker dependent
- speech sounds
- language model
- automatic speech recognition systems
- speaker diarization
- hidden markov models
- noisy environments
- digit recognition
- word processing
- mel frequency cepstral coefficients
- discrete geometry
- information retrieval
- human machine interaction
- continuous domains
- video sequences