Unsupervised Grounding of Textual Descriptions of Object Features and Actions in Video.
Muhannad Al-OmariEris ChinellatoYiannis GatsoulisDavid C. HoggAnthony G. CohnPublished in: KR (2016)
Keyphrases
- textual descriptions
- object features
- metadata
- semantic representation
- semantic concepts
- web images
- d objects
- semantic information
- image features
- multimedia
- keywords
- video data
- multi modal
- semi supervised
- object recognition
- video sequences
- feature extraction
- visual features
- machine learning
- natural language processing
- video frames
- multimedia content
- object representation
- visual scene
- search engine