Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training.
Yingwei PanYehao LiJianjie LuoJun XuTing YaoTao MeiPublished in: CoRR (2020)
Keyphrases
- video content
- natural language
- trecvid multimedia event detection
- video data
- event recognition
- real time
- multimedia
- video sequences
- human actions
- event detection
- computer vision
- target language
- video frames
- video streams
- visual features
- syntactic parsing
- training dataset
- temporal information
- vision system
- video dataset
- space time
- image sequences
- weakly labeled
- training corpus
- news video
- programming language
- object detectors
- video database
- visual information
- text summarization
- video retrieval
- web videos
- million images
- classifier training
- parallel corpus
- video analysis
- training set
- image search
- source language
- visual content
- textual descriptions
- sentence level
- syntactic categories
- training examples
- semantic role labeling