CAST: Cross-Attention in Space and Time for Video Action Recognition.
Dongho LeeJongseo LeeJinwoo ChoiPublished in: CoRR (2023)
Keyphrases
- action recognition
- human actions
- action classification
- spatial temporal
- video dataset
- space time
- action detection
- recognizing human actions
- static images
- recognition of human actions
- motion features
- activity recognition
- space time interest points
- human activities
- bag of words
- computer vision
- spatio temporal interest points
- human detection
- body parts
- mid level
- video sequences
- recognizing actions
- spatial and temporal
- video data
- view invariant
- human pose
- multimedia
- spatio temporal
- key frames
- video content
- motion history images
- object detection
- max margin
- video images
- video streams