Align and Prompt: Video-and-Language Pre-training with Entity Prompts.
Dongxu LiJunnan LiHongdong LiJuan Carlos NieblesSteven C. H. HoiPublished in: CoRR (2021)
Keyphrases
- multimedia
- video data
- video sequences
- natural language
- video content
- real time
- video streams
- video analysis
- training phase
- video database
- training process
- video frames
- named entities
- knowledge base
- video images
- video clips
- training samples
- online learning
- programming language
- key frames
- co occurrence
- video retrieval
- natural language processing
- supervised learning
- object detection
- object oriented
- spatio temporal
- coreference resolution
- decision trees
- pre trained