Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning.

Chia-Wen Kuo Zsolt Kira

Published in: CoRR (2022)

Keyphrases

input image
image features
image data
image content
image classification
bounding box
image retrieval
image segmentation
object detection
multiscale
image regions
keypoints
image set
visual data
multimedia
image collections
image representation
similarity measure
video sequences
video data
spatial information
image matching
high level
test images
pairwise
high resolution