Improving Cross-Modal Alignment with Synthetic Pairs for Text-Only Image Captioning.
Zhiyue LiuJinyuan LiuFanrong MaPublished in: AAAI (2024)
Keyphrases
- cross modal
- visual similarity
- image data
- image classification
- image retrieval
- image features
- image regions
- multiscale
- web images
- image collections
- computer vision
- visual data
- image content
- multiple modalities
- information retrieval
- multi modal
- image segmentation
- text retrieval
- test collection
- keywords
- similarity measure