Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention.
Xubo LiuQiushi HuangXinhao MeiHaohe LiuQiuqiang KongJianyuan SunShengchen LiTom KoYu ZhangH. Lilian TangMark D. PlumbleyVolkan KiliçWenwu WangPublished in: INTERSPEECH (2023)
Keyphrases
- visual attention
- multimedia
- saliency map
- eye tracking
- natural scenes
- vision system
- focus of attention
- audio visual
- eye movements
- visual data
- visual perception
- salient regions
- attention mechanism
- visual attention model
- visual information
- higher level
- visual search
- visual saliency
- visual field
- visual scene
- computer vision
- image representation
- machine vision systems
- visual saliency detection
- object based visual attention