Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition.
Yifei ChenDapeng ChenRuijin LiuSai ZhouWenyuan XueWei PengPublished in: CoRR (2023)
Keyphrases
- machine learning
- action recognition
- human actions
- action classification
- spatial temporal
- video dataset
- action detection
- computer vision
- recognizing human actions
- space time interest points
- recognition of human actions
- activity recognition
- static images
- motion features
- bag of words
- human activities
- spatio temporal interest points
- motion history images
- human detection
- mid level
- video data
- video sequences
- video frames
- video images
- video content
- recognizing actions
- body parts
- bag of features
- space time
- pairwise
- depth sensors
- motion capture data
- key frames
- text classification
- video shots
- view invariant
- input image
- video analysis
- spatio temporal
- visual features
- human motion
- video surveillance