CAST: Cross-Modal Retrieval and Visual Conditioning for image captioning.
Shan CaoGaoyun AnYigang CenZhaoqilin YangWeisi LinPublished in: Pattern Recognit. (2024)
Keyphrases
- cross modal
- image retrieval
- visual similarity
- multi modal
- visual data
- visual features
- perceptual information
- image data
- image content
- multimedia retrieval
- web images
- visual concepts
- image classification
- image features
- image database
- image representation
- visual content
- semantic similarity
- multimedia databases
- multiscale
- low level
- image regions
- visual information
- information retrieval
- relevance feedback
- similarity measure
- high level
- video data
- query expansion
- automatic image annotation
- image search
- semantic concepts
- semantic gap
- visual recognition
- keywords