Exploring Context, Attention and Audio Features for Audio Visual Scene-Aware Dialog.
Shachi H. KumarEda OkurSaurav SahayJonathan HuangLama NachmanPublished in: CoRR (2019)
Keyphrases
- audio features
- visual scene
- audio visual
- visual attention
- visual information
- low level
- visual features
- genre classification
- multi modal
- music information retrieval
- vision system
- context aware
- contextual information
- feature set
- visual data
- computer vision
- complex scenes
- high level
- multimedia
- audio content
- automatic music genre classification
- focus of attention
- text data
- image collections
- spatial relations
- higher level