Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog.
Shachi H. KumarEda OkurSaurav SahayJonathan HuangLama NachmanPublished in: ViGIL@NeurIPS (2019)
Keyphrases
- audio features
- visual scene
- audio visual
- visual information
- visual attention
- music retrieval
- visual features
- low level
- text data
- multi modal
- feature set
- music information retrieval
- eye movements
- object recognition
- vision system
- saliency map
- visual data
- sound source
- focus of attention
- eye tracking
- higher level
- complex scenes
- topic models
- keywords
- audio content
- information retrieval
- automatic music genre classification
- natural images
- natural scenes
- high level
- image collections
- contextual information
- multimedia
- image classification
- user interface
- natural language
- feature extraction