Situated reference resolution using visual saliency and crowdsourcing-based priors for a spoken dialog system within vehicles.
Teruhisa MisuPublished in: Comput. Speech Lang. (2018)
Keyphrases
- reference resolution
- visual saliency
- bayesian framework
- saliency map
- relation extraction
- natural images
- visual attention
- natural language processing
- human detection
- prior knowledge
- visual search
- eye movements
- named entity recognition
- language understanding
- generative model
- information extraction
- real time
- coreference resolution
- maximum a posteriori
- automatic extraction
- region of interest
- natural language text
- focus of attention
- video summarization
- machine learning
- higher level
- expectation maximization
- object detection
- object recognition
- multiscale
- bayesian networks