Login / Signup
CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising.
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Hongyang Chao
Tao Mei
Published in:
CoRR (2021)
Keyphrases
</>
cross modal
denoising
multi modal
visual data
video data
semantic concepts
video streams
multimedia
video sequences
multimedia retrieval
image retrieval
visual recognition
visual features
training set
visual similarity
multimedia databases
video content
image processing
search engine
natural images
image annotation
video retrieval
key frames
video analysis
multimedia data
keypoints
space time
training examples