Joint Commonsense and Relation Reasoning for Image and Video Captioning.
Jingyi HouXinxiao WuXiaoxun ZhangYayun QiYunde JiaJiebo LuoPublished in: AAAI (2020)
Keyphrases
- image data
- multiscale
- single image
- template matching
- image content
- input image
- image representation
- image features
- video frames
- image collections
- image regions
- video surveillance
- image classification
- image segmentation
- static images
- image analysis
- video sequences
- similarity measure
- image matching
- video images
- image set
- key frames
- visual cues
- image pixels
- visual data
- weakly labeled
- automated reasoning
- commonsense knowledge
- computer vision
- test images
- space time
- video data
- low level
- high resolution
- image retrieval
- multimedia
- knowledge base
- region of interest
- spatial information
- segmentation method
- video analysis
- image database
- object motion