Login / Signup
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix.
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
Published in:
CoRR (2022)
Keyphrases
</>
cross modal
multi modal
computer vision
training set
image retrieval
visual recognition
multimedia retrieval
multimedia
natural language
feature space
low level
image data
data management
multimedia databases
visual data
perceptual information