EgoViT: Pyramid Video Transformer for Egocentric Action Recognition.
Chenbin PanZhiqi ZhangSenem VelipasalarYi XuPublished in: CoRR (2023)
Keyphrases
- action recognition
- human actions
- activity recognition
- action classification
- spatial temporal
- video dataset
- action detection
- human activities
- recognition of human actions
- motion features
- recognizing human actions
- static images
- space time interest points
- bag of words
- computer vision
- recognizing actions
- human activity recognition
- human detection
- body parts
- video sequences
- multimedia
- event recognition
- mid level
- video content
- view invariant
- video data
- depth sensors
- motion history images
- multiscale
- action recognition in videos
- space time
- spatio temporal
- human pose
- motion capture data
- video frames
- histogram of oriented gradients
- event detection
- image representation
- bag of features