Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog.
Shachi H. KumarEda OkurSaurav SahayJonathan HuangLama NachmanPublished in: CoRR (2019)
Keyphrases
- audio features
- audio visual
- visual scene
- visual information
- visual attention
- music retrieval
- visual features
- low level
- multi modal
- text data
- visual data
- focus of attention
- feature set
- saliency map
- music information retrieval
- eye movements
- vision system
- natural scenes
- sound source
- complex scenes
- user interface
- user interests
- audio content
- information retrieval
- topic models
- object recognition
- keywords
- eye tracking
- image classification
- personalized recommendation
- multimedia
- real time
- image collections
- text documents
- document collections
- natural language
- computer vision