ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition.
Soumyabrata ChaudhuriSaumik BhattacharyaPublished in: CoRR (2023)
Keyphrases
- action recognition
- human actions
- action detection
- action classification
- computer vision
- spatial temporal
- human pose
- recognizing actions
- bag of words
- video dataset
- recognizing human actions
- activity recognition
- recognition of human actions
- mid level
- static images
- human detection
- video sequences
- motion features
- human activities
- space time interest points
- body parts
- multimedia
- vision system
- motion history images
- bag of features
- pose estimation
- view invariant
- spatio temporal
- machine learning
- video analysis
- video clips
- depth cameras
- video data
- depth sensors