CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising.
Jianjie LuoYehao LiYingwei PanTing YaoHongyang ChaoTao MeiPublished in: CoRR (2021)
Keyphrases
- cross modal
- denoising
- multi modal
- visual data
- video data
- semantic concepts
- video streams
- multimedia
- video sequences
- multimedia retrieval
- image retrieval
- visual recognition
- visual features
- training set
- visual similarity
- multimedia databases
- video content
- image processing
- search engine
- natural images
- image annotation
- video retrieval
- key frames
- video analysis
- multimedia data
- keypoints
- space time
- training examples