Multimodal Target Speech Separation with Voice and Face References.
Leyuan QuCornelius WeberStefan WermterPublished in: INTERSPEECH (2020)
Keyphrases
- text to speech
- audio visual
- recognition engine
- emotion recognition
- multimodal interaction
- speech synthesis
- speech recognition
- speech recognition errors
- speech quality
- multimodal interfaces
- multimodal fusion
- fundamental frequency
- facial expressions
- speech sounds
- multi modal
- voice activity detection
- visual speech
- face verification
- target tracking
- multi stream
- speech signal
- multimodal biometrics
- human faces
- face biometrics
- synthesized speech
- automatic speech recognition
- facial gestures
- facial images
- face images
- speaker identification
- human computer interaction
- voice recognition
- moving target
- multimedia
- text to speech synthesis
- spoken language
- audio features