Violin: A Large-Scale Dataset for Video-and-Language Inference.
Jingzhou LiuWenhu ChenYu ChengZhe GanLicheng YuYiming YangJingjing LiuPublished in: CVPR (2020)
Keyphrases
- trecvid multimedia event detection
- event recognition
- video data
- weakly labeled
- multimedia
- video content
- video streams
- programming language
- human actions
- event detection
- real time
- video sequences
- small scale
- web videos
- bayesian networks
- space time
- language learning
- database
- benchmark datasets
- natural language
- video surveillance
- probabilistic inference
- real life
- bayesian inference
- video analysis
- video database
- data model
- real world
- million images
- audio signal
- video search
- digital video
- web scale
- inference process
- target language
- feature vectors
- video frames
- action recognition