ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech Representations.
Shehzeen HussainPaarth NeekharaJocelyn HuangJason LiBoris GinsburgPublished in: CoRR (2023)
Keyphrases
- text to speech
- speech recognition errors
- emotion recognition
- speech recognition
- voice activity detection
- real time
- audio visual
- higher level
- speech synthesis
- fundamental frequency
- learning algorithm
- symbolic representation
- automatic speech recognition
- facial expressions
- multi stream
- spontaneous speech
- recognition engine
- case study
- text to speech synthesis