VIOLIN: A Large-Scale Dataset for Video-and-Language Inference.
Jingzhou LiuWenhu ChenYu ChengZhe GanLicheng YuYiming YangJingjing LiuPublished in: CoRR (2020)
Keyphrases
- trecvid multimedia event detection
- event recognition
- event detection
- video content
- human actions
- video data
- programming language
- video sequences
- weakly labeled
- natural language
- video dataset
- video streams
- real world
- probabilistic inference
- space time
- web videos
- multimedia
- video clips
- video database
- video analysis
- video frames
- benchmark datasets
- language learning
- video surveillance
- million images
- small scale
- bayesian networks
- bayesian inference
- action recognition
- real time
- video shots
- computer vision
- audio signal
- belief networks
- video collections
- key frames
- probabilistic model