Image Captioning with Visual Object Representations Grounded in the Textual Modality.
Dusan VarisKatsuhito SudohSatoshi NakamuraPublished in: CoRR (2020)
Keyphrases
- object representations
- image features
- image content
- low level
- image data
- input image
- single image
- multiscale
- image retrieval
- image classification
- image segmentation
- complex objects
- real world objects
- image regions
- feature points
- spatial information
- similarity measure
- object representation
- pixel wise
- image representation
- object categorization
- spatial relations
- keypoints
- multi modal
- co occurrence
- three dimensional