Investigating topics, audio representations and attention for multimodal scene-aware dialog.
Shachi H. KumarEda OkurSaurav SahayJonathan HuangLama NachmanPublished in: Comput. Speech Lang. (2020)
Keyphrases
- probabilistic model
- topic models
- visual data
- audio visual
- multimedia
- multimodal information
- d scene
- video sequences
- scene change detection
- single image
- multimodal fusion
- semantic context
- multi modal
- input image
- three dimensional
- complex scenes
- multi stream
- information retrieval
- cross modal
- object models
- scene analysis
- natural language
- spoken dialog
- music retrieval
- video scene
- scene classification
- outdoor scenes
- saliency map
- visual information
- story segmentation
- d objects
- signal processing
- real scenes
- scene understanding
- multimodal interaction
- visual attention
- keywords
- visual features