Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition.
Kun-Yu LinHenghui DingJiaming ZhouYi-Xing PengZhilin ZhaoChen Change LoyWei-Shi ZhengPublished in: CoRR (2024)
Keyphrases
- spatio temporal
- cross domain
- human actions
- action recognition
- spatial temporal
- action classification
- recognition of human actions
- video dataset
- recognizing human actions
- knowledge transfer
- action detection
- video clips
- transfer learning
- human activities
- text categorization
- learning environment
- action recognition in videos
- static images
- human motion
- bag of words
- learning process
- image sequences
- e learning
- video frames
- video segments
- learning experience
- video sequences
- computer vision
- atomic actions
- motion history images
- space time interest points
- video streams
- low level features
- activity recognition
- e government
- event recognition
- video content
- key frames
- visual words
- video data
- semi supervised
- keywords