Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning.
Mozhgan PourKeshavarzShahabedin NabaviMohsen Ebrahimi MoghaddamMehrnoush ShamsfardPublished in: CoRR (2023)
Keyphrases
- image features
- cross modal
- image data
- image retrieval
- multiscale
- image classification
- visual data
- image collections
- image content
- image regions
- test images
- image segmentation
- input image
- object recognition
- image representation
- multi modal
- similarity measure
- information retrieval
- visual similarity
- keypoints
- spatial information
- high dimensional
- image search
- feature selection
- computer vision