Multimodal Attention Fusion for Target Speaker Extraction.
Hiroshi SatoTsubasa OchiaiKeisuke KinoshitaMarc DelcroixTomohiro NakataniShoko ArakiPublished in: CoRR (2021)
Keyphrases
- audio visual
- multimodal fusion
- multi modal
- information extraction
- visual attention
- information fusion
- data fusion
- automatically extracted
- automatic extraction
- speaker verification
- multimodal biometrics
- single modality
- multimodal medical images
- fusion method
- speaker recognition
- fusion methods
- multimodal interaction
- moving target
- automatic speech recognition
- neural network
- knowledge extraction
- target object
- speech recognition
- pattern recognition