Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.
Haiyang XuQinghao YeXuan WuMing YanYuan MiaoJiabo YeGuohai XuAnwen HuYaya ShiGuangwei XuChenliang LiQi QianMaofei QueJi ZhangXiao ZengFei HuangPublished in: CoRR (2023)
Keyphrases
- trecvid multimedia event detection
- real world
- video streams
- video frames
- million images
- event recognition
- human actions
- real time
- training dataset
- training set
- video data
- video database
- event detection
- multimedia
- chinese language
- programming language
- video content
- natural language
- video sequences
- video clips
- video dataset
- video retrieval
- training process
- space time
- real life
- language learning
- object detectors
- classifier training
- neural network
- video shots
- video analysis
- video surveillance
- benchmark datasets
- training samples
- supervised learning
- small scale
- web scale
- key frames
- action recognition
- story segmentation
- weakly labeled
- chinese web