Cross-modal Pretraining and Matching for Video Understanding.
Limin WangPublished in: MMPT@ICMR (2021)
Keyphrases
- cross modal
- multi modal
- visual data
- multimedia
- video data
- video sequences
- multimedia databases
- image retrieval
- multimedia retrieval
- video content
- multimedia data
- video streams
- visual recognition
- key frames
- video retrieval
- video frames
- information retrieval
- human activities
- visual similarity
- spatio temporal
- perceptual information