Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training.
Yingwei PanYehao LiJianjie LuoJun XuTing YaoTao MeiPublished in: ACM Multimedia (2022)
Keyphrases
- video content
- natural language
- trecvid multimedia event detection
- real time
- event recognition
- event detection
- video data
- training dataset
- human actions
- video sequences
- video streams
- multimedia
- vision system
- target language
- training examples
- video analysis
- video frames
- weakly labeled
- computer vision
- temporal information
- training set
- programming language
- news video
- video clips
- image search
- syntactic parsing
- classifier training
- source language
- space time
- video retrieval
- image classification
- video database
- text classification
- training corpus
- video dataset
- video shots
- machine translation
- human activities
- key frames