AS-Net: active speaker detection using deep audio-visual attention.
Abduljalil RadmanJorma LaaksonenPublished in: Multim. Tools Appl. (2024)
Keyphrases
- visual attention
- saliency map
- eye tracking
- eye movements
- audio visual
- visual search
- focus of attention
- natural scenes
- higher level
- visual attention model
- visual perception
- vision system
- visual information
- visual saliency detection
- speaker identification
- visual saliency
- speech recognition
- detection method
- visual scene
- multimedia
- prosodic features
- event detection
- object detection
- salient regions
- eye tracking data
- attention mechanism
- biological vision systems
- object recognition
- saliency detection
- feature extraction
- real time