Zero-Shot Action Recognition with Transformer-based Video Semantic Embedding.
Keval DoshiYasin YilmazPublished in: CoRR (2022)
Keyphrases
- action recognition
- human actions
- action classification
- video dataset
- spatial temporal
- action detection
- recognizing human actions
- human activities
- motion features
- recognition of human actions
- activity recognition
- static images
- space time interest points
- bag of words
- human detection
- mid level
- semantic concepts
- computer vision
- video sequences
- video clips
- body parts
- high level
- semantic information
- space time
- recognizing actions
- video content
- human pose
- video frames
- video data
- motion history images
- multimedia
- action recognition in videos
- view invariant
- low level features
- video shots
- atomic actions
- object recognition
- low level
- action primitives
- event recognition
- video retrieval
- semantic similarity