Building a Video-and-Language Dataset with Human Actions for Multimodal Logical Inference.
Riko SuzukiHitomi YanakaKoji MineshimaDaisuke BekkiPublished in: CoRR (2021)
Keyphrases
- human actions
- logical inference
- action recognition
- spatio temporal
- video sequences
- space time
- human motion
- human activities
- action classification
- activity recognition
- theorem proving
- logical structure
- recognition of human actions
- visual features
- recognizing human actions
- recognizing actions
- visual data
- space time interest points
- action sequences
- video shots
- web videos
- natural language
- video data
- computer vision
- sensor data
- active learning
- image sequences
- multimedia