Bridging the gap: dual perception attention and local-global similarity fusion for cross-modal image-text matching.
Xiangyu ShuiZhenfang ZhuYun LiuHongli PeiKefeng LiHuaxiang ZhangPublished in: Multim. Tools Appl. (2024)
Keyphrases
- cross modal
- visual similarity
- matching score
- image features
- image matching
- similarity measure
- image data
- perceptual information
- image content
- image retrieval
- image representation
- image classification
- keypoints
- web images
- search engine
- multiscale
- image collections
- information retrieval
- image segmentation
- text retrieval
- test images
- image regions
- multi modal
- image set
- low level
- scene classification
- automatic image annotation
- multimedia retrieval
- keywords
- multiple modalities