ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition.
Soumyabrata ChaudhuriSaumik BhattacharyaPublished in: ICVGIP (2023)
Keyphrases
- action recognition
- human actions
- action detection
- action classification
- spatial temporal
- human pose
- computer vision
- recognizing actions
- video dataset
- bag of words
- recognizing human actions
- static images
- activity recognition
- recognition of human actions
- human detection
- motion features
- body parts
- mid level
- pose estimation
- space time interest points
- human activities
- space time
- view invariant
- atomic actions
- depth sensors
- object detection
- vision system
- depth cameras
- d objects
- machine learning
- motion history images
- video frames
- video shots
- partial occlusion
- temporal information