Attention-Based Keyword Localisation in Speech using Visual Grounding.
Kayode OlaleyeHerman KamperPublished in: CoRR (2021)
Keyphrases
- selective attention
- visual information
- keywords
- visual features
- speech signal
- speech recognition
- visual cues
- audio visual
- focus of attention
- human vision
- visual field
- spoken language
- speech synthesis
- saliency map
- visual attention
- data sets
- visual saliency
- low level
- broadcast news
- content based video retrieval
- autistic children