Multimodal Adaptation of CLIP for Few-Shot Action Recognition.
Jiazheng XingMengmeng WangXiaojun HouGuang DaiJingdong WangYong LiuPublished in: CoRR (2023)
Keyphrases
- action recognition
- human actions
- key frames
- bag of words
- activity recognition
- human detection
- video clips
- computer vision
- spatial temporal
- action classification
- video data
- multi modal
- body parts
- video sequences
- recognizing human actions
- view invariant
- video shots
- visual features
- low level features
- recognition of human actions
- action detection
- video dataset
- feature vectors
- mid level
- static images
- view invariant action recognition
- independent subspace analysis
- action recognition in videos
- image retrieval
- action primitives
- video indexing
- human pose
- event detection
- video content