IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion.
Wendong GanBolong WenYing YanHaitao ChenZhichao WangHongqiang DuLei XieKaixuan GuoHai LiPublished in: CoRR (2022)
Keyphrases
- text to speech
- speech synthesis
- synthesized speech
- prosodic features
- speech recognition
- audio visual
- text to speech synthesis
- translation invariant
- representation scheme
- model construction
- continuous domains
- emotion recognition
- data sets
- speaker recognition
- finite number
- fundamental frequency
- temporal aspects
- dynamic bayesian networks
- discrete geometry
- multi stream
- vocal tract
- morphological skeleton
- gaussian mixture model
- image representation
- language model