Login / Signup
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.
Haiyang Xu
Qinghao Ye
Ming Yan
Yaya Shi
Jiabo Ye
Yuanhong Xu
Chenliang Li
Bin Bi
Qi Qian
Wei Wang
Guohai Xu
Ji Zhang
Songfang Huang
Fei Huang
Jingren Zhou
Published in:
CoRR (2023)
Keyphrases
</>
multi modal
image analysis
video search
high level
multiscale
image data
multiple modalities
image classification
similarity measure
low level
uni modal
multimedia
fusing multiple
visual data
video retrieval
image collections
video streams
mutual information
image registration
high resolution
feature space