Login / Signup
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training.
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfeng Yuan
Qinghao Ye
Chenliang Li
Ming Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
Published in:
CoRR (2024)
Keyphrases
</>
cross modal
image features
image retrieval
image data
multiscale
image segmentation
image content
visual similarity
image representation
image collections
image classification
test images
low level
multi modal
spatial relationships
visual data
video sequences
spatial information
multimedia retrieval