Image Captioning with Visual Object Representations Grounded in the Textual Modality.

Dusan Varis Katsuhito Sudoh Satoshi Nakamura

Published in: CoRR (2020)

Keyphrases

object representations
image features
image content
low level
image data
input image
single image
multiscale
image retrieval
image classification
image segmentation
complex objects
real world objects
image regions
feature points
spatial information
similarity measure
object representation
pixel wise
image representation
object categorization
spatial relations
keypoints
multi modal
co occurrence
three dimensional