Learning to Generate Grounded Image Captions without Localization Supervision.
Chih-Yao MaYannis KalantidisGhassan AlRegibPeter VajdaMarcus RohrbachZsolt KiraPublished in: CoRR (2019)
Keyphrases
- learning algorithm
- template matching
- image content
- object localization
- input image
- learning process
- prior knowledge
- active learning
- image features
- multiscale
- visual features
- image representation
- high resolution
- image data
- image analysis
- edge detection
- similarity measure
- image regions
- hough transform
- precise localization
- image classification
- supervised learning
- image segmentation