Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning.
Zhiyue LiuJinyuan LiuFanrong MaPublished in: CoRR (2023)
Keyphrases
- cross modal
- image retrieval
- multiscale
- visual similarity
- image features
- image data
- image representation
- image classification
- web images
- image content
- image segmentation
- similarity measure
- image collections
- image regions
- text retrieval
- visual data
- multi modal
- image database
- low level
- image set
- semantic similarity
- semantic information
- information retrieval
- visual features
- multimedia retrieval