Login / Signup
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment.
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Ruihua Song
Houqiang Li
Jiebo Luo
Published in:
CoRR (2022)
Keyphrases
</>
pre trained
probabilistic model
image representation
image classification
image features
em algorithm
single image
statistical model
data sets
image segmentation
training data
image retrieval
input image
video data
image matching
image set