Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation.
Wei WeiLing ChengXianling MaoGuangyou ZhouFeida ZhuPublished in: CoRR (2019)
Keyphrases
- visual features
- input image
- image classification
- image data
- single image
- visual appearance
- low level
- image analysis
- image collections
- image features
- image content
- semantic information
- visual concepts
- visual perception
- test images
- high resolution
- image retrieval
- multiscale
- image representation
- visual cues
- low level visual features
- high level semantics
- web images
- high level
- semantic similarity
- visual similarity
- region of interest
- auto annotation
- semantic space
- visually similar
- selective attention
- keypoints
- image regions
- semantic gap
- spatial information
- visual data
- semantic content
- feature points
- object recognition
- visual information
- bounding box
- image segmentation
- semantic labels
- pixel values