AGREE: Aligning Cross-Modal Entities for Image-Text Retrieval Upon Vision-Language Pre-trained Models.
Xiaodan WangLei LiZhixu LiXuwu WangXiangru ZhuChengyu WangJun HuangYanghua XiaoPublished in: WSDM (2023)
Keyphrases
- text retrieval
- pre trained
- image retrieval
- cross modal
- multimedia retrieval
- image features
- multi modal
- image content
- training data
- image classification
- image collections
- probabilistic model
- visual similarity
- document collections
- document retrieval
- low level
- machine learning
- image representation
- image regions
- multimedia information retrieval
- query expansion
- information retrieval