Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario.
Shao-En WengHong-Han ShuaiWen-Huang ChengPublished in: AAAI (2023)
Keyphrases
- real world
- text to speech
- emotion recognition
- recognition engine
- speech synthesis
- voice activity detection
- speech recognition
- data sets
- facial expressions
- wide range
- speech recognition errors
- voice recognition
- synthetic data
- speech sounds
- speech quality
- case study
- fundamental frequency
- human faces
- speaker recognition
- data mining
- pattern recognition
- automatic speech recognition
- audio visual
- object recognition
- endpoint detection
- text to speech synthesis
- real time