Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning.
Chia-Wen KuoZsolt KiraPublished in: CoRR (2022)
Keyphrases
- input image
- image features
- image data
- image content
- image classification
- bounding box
- image retrieval
- image segmentation
- object detection
- multiscale
- image regions
- keypoints
- image set
- visual data
- multimedia
- image collections
- image representation
- similarity measure
- video sequences
- video data
- spatial information
- image matching
- high level
- test images
- pairwise
- high resolution