Login / Signup
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.
Haiyang Xu
Qinghao Ye
Ming Yan
Yaya Shi
Jiabo Ye
Yuanhong Xu
Chenliang Li
Bin Bi
Qi Qian
Wei Wang
Guohai Xu
Ji Zhang
Songfang Huang
Fei Huang
Jingren Zhou
Published in:
ICML (2023)
Keyphrases
</>
multi modal
video search
image features
image classification
image segmentation
multiscale
video sequences
low level
multiple modalities
image analysis
high resolution
image data
semantic concepts
visual cues
multi modality
auto annotation