Multimodal Target Speech Separation with Voice and Face References.
Leyuan QuCornelius WeberStefan WermterPublished in: CoRR (2020)
Keyphrases
- text to speech
- audio visual
- emotion recognition
- recognition engine
- multimodal interaction
- multimodal interfaces
- multimodal fusion
- voice recognition
- facial expressions
- speech recognition errors
- speech quality
- human faces
- speech synthesis
- speech recognition
- face biometrics
- prosodic features
- multi stream
- fundamental frequency
- visual speech
- voice activity detection
- facial images
- multi modal
- facial gestures
- speech sounds
- endpoint detection
- target object
- facial animation
- multimedia
- speech signal
- biometric systems
- moving target