Multi-modal reward for visual relationships-based image captioning.
Ali AbediHossein KarshenasPeyman AdibiPublished in: CoRR (2023)
Keyphrases
- multi modal
- auto annotation
- single modality
- cross modal
- uni modal
- image data
- image features
- input image
- image segmentation
- multi modality
- fusing multiple
- image analysis
- multiple modalities
- video search
- low level
- image retrieval
- similarity measure
- web images
- visual data
- image annotation
- image content
- image regions
- multiscale
- segmentation method
- audio visual
- visual cues
- visual information
- automatic image annotation
- image classification
- semantic concepts
- spatial relationships
- segmentation algorithm
- image representation
- edge detection
- high resolution
- image collections