Multimodal Attention Fusion for Target Speaker Extraction.
Hiroshi SatoTsubasa OchiaiKeisuke KinoshitaMarc DelcroixTomohiro NakataniShoko ArakiPublished in: SLT (2021)
Keyphrases
- audio visual
- multimodal fusion
- multi modal
- multimodal biometrics
- speech recognition
- data fusion
- visual information
- multimodal medical images
- information extraction
- target recognition
- neural network
- knowledge extraction
- automatically extracted
- visual data
- multimodal interaction
- focus of attention
- target tracking
- moving target
- target object
- multi sensor
- visual attention
- medical images
- pattern recognition
- image processing