M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition.
Mengmeng WangJiazheng XingBoyuan JiangJun ChenJianbiao MeiXingxing ZuoGuang DaiJingdong WangYong LiuPublished in: CoRR (2024)
Keyphrases
- action recognition
- multi task
- video dataset
- human actions
- action classification
- video clips
- activity recognition
- recognizing human actions
- bag of words
- multi task learning
- recognition of human actions
- sparse learning
- action detection
- video sequences
- computer vision
- body parts
- space time interest points
- transfer learning
- learning tasks
- video frames
- video retrieval
- key frames
- gaussian processes
- human detection
- learning experience
- multi class
- knn
- probabilistic model
- image sequences
- feature selection