VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending.
Xingjian HeSihan ChenFan MaZhicheng HuangXiaojie JinZikang LiuDongmei FuYi YangJing LiuJiashi FengPublished in: CoRR (2023)
Keyphrases
- video streams
- video frames
- video sequences
- language learning
- video data
- programming language
- motion features
- natural language
- video analysis
- real time
- space time
- training samples
- video content
- video surveillance
- video database
- multiple features
- training phase
- data sets
- test set
- feature vectors
- image features
- supervised learning
- feature set
- training examples
- key frames
- human activities
- spatial and temporal
- training process
- training algorithm
- action recognition
- video shots
- training data
- image sequences
- specification language
- multimedia