Login / Signup
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training.
Gen Li
Nan Duan
Yuejian Fang
Daxin Jiang
Ming Zhou
Published in:
CoRR (2019)
Keyphrases
</>
cross modal
multi modal
multimedia retrieval
computer vision
image retrieval
multimedia databases
natural language
training set
visual recognition
visual data
visual similarity
perceptual information
image sequences
supervised learning