Multi-stage Multi-modal Pre-training for Video Representation.
Chunquan ChenLujia BaoWeikang LiXiaoshuai ChenXinghai SunChao QiPublished in: NLPCC (2) (2021)
Keyphrases
- multi modal
- multistage
- video representation
- spatio temporal
- multi modality
- dynamic programming
- space time
- high dimensional
- video streams
- video analysis
- audio visual
- video database
- video content
- image annotation
- generative model
- motion patterns
- semantic concepts
- spatial information
- optimal policy
- key frames
- multimedia databases
- video processing
- visual vocabulary
- machine learning