Login / Signup
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training.
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
Published in:
ACM Multimedia (2022)
Keyphrases
</>
video content
natural language
trecvid multimedia event detection
real time
event recognition
event detection
video data
training dataset
human actions
video sequences
video streams
multimedia
vision system
target language
training examples
video analysis
video frames
weakly labeled
computer vision
temporal information
training set
programming language
news video
video clips
image search
syntactic parsing
classifier training
source language
space time
video retrieval
image classification
video database
text classification
training corpus
video dataset
video shots
machine translation
human activities
key frames