Language-guided Multi-Modal Fusion for Video Action Recognition.
Jenhao HsiaoYikang LiChiuman HoPublished in: ICCVW (2021)
Keyphrases
- action recognition
- multi modal fusion
- human actions
- action classification
- spatial temporal
- video database
- action detection
- video dataset
- recognition of human actions
- recognizing human actions
- activity recognition
- motion features
- static images
- bag of words
- human detection
- human activities
- space time interest points
- computer vision
- motion history images
- body parts
- recognizing actions
- bag of features
- mid level
- video content
- video data
- human pose
- facial features
- atomic actions
- multi class
- machine learning
- feature selection
- video sequences
- view invariant
- video images
- video analysis