Video-Text Pre-training with Learned Regions.
Rui YanMike Zheng ShouYixiao GeAlex Jinpeng WangXudong LinGuanyu CaiJinhui TangPublished in: CoRR (2021)
Keyphrases
- text detection
- natural language descriptions
- video streams
- text mining
- video content
- video sequences
- real time
- dynamic textures
- text retrieval
- video data
- news video
- image regions
- video search
- video analysis
- unsupervised manner
- set of training images
- pre trained
- information retrieval
- video clips
- multimedia
- image features
- video retrieval
- video database
- multimedia search
- keywords
- text documents
- video objects
- video segments
- video scene
- training stage
- audio content
- training set
- moving objects
- input image
- multimedia documents
- video surveillance
- training phase
- training process