CoCo-BERT: Improving Video-Language Pre-training with Contrastive Cross-modal Matching and Denoising.
Jianjie LuoYehao LiYingwei PanTing YaoHongyang ChaoTao MeiPublished in: ACM Multimedia (2021)
Keyphrases
- cross modal
- denoising
- multi modal
- visual data
- video sequences
- multimedia
- image retrieval
- video content
- semantic concepts
- video frames
- video analysis
- multimedia databases
- multimedia retrieval
- video data
- natural images
- multiscale
- key frames
- visual similarity
- training set
- training examples
- video retrieval
- multimedia data
- information retrieval
- high dimensional
- feature selection