Vision Transformer with Cross-attention by Temporal Shift for Efficient Action Recognition.
Ryota HashiguchiToru TamakiPublished in: CoRR (2022)
Keyphrases
- action recognition
- computer vision
- spatial temporal
- activity recognition
- human actions
- bag of words
- human detection
- recognizing human actions
- static images
- body parts
- recognition of human actions
- action classification
- bag of features
- spatio temporal
- spatial and temporal
- depth sensors
- motion history images
- recognizing actions
- temporal information
- human activities
- text classification
- human pose
- vision system
- view invariant
- object recognition
- high level