Sign in
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training.
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Published in:
AAAI (2020)
Keyphrases
</>
cross modal
multi modal
computer vision
multimedia retrieval
perceptual information
image retrieval
search engine
natural language
multimedia databases
visual data
image data
visual features
text retrieval
image search
visual similarity