Spatiotemporal Representation Enhanced ViT for Video Recognition.
Min LiFengfa LiBo MengRuwen BaiJunxing RenZihao HuangChenghua GaoPublished in: MMM (1) (2024)
Keyphrases
- space time
- human activities
- spatial and temporal
- video sequences
- spatiotemporal features
- object recognition
- video representation
- temporal structure
- pattern recognition
- invariant representation
- recognition rate
- recognition accuracy
- temporal segmentation
- video streams
- multimedia
- real time
- character recognition
- invariant recognition
- activity detection
- spatio temporal
- text detection
- object detection
- visual speech recognition
- moving objects
- object representations
- video data
- object representation
- representation scheme
- automatic recognition
- video content
- activity recognition