Auxiliary audio-textual modalities for better action recognition on vision-specific annotated videos.
Saghir AlfaslyJian LuChen XuYu LiYuru ZouPublished in: Pattern Recognit. (2024)
Keyphrases
- action recognition
- human actions
- action classification
- computer vision
- recognition of human actions
- video dataset
- recognizing human actions
- visual data
- recognizing actions
- view invariant
- video database
- multimedia
- static images
- human detection
- ucf sports
- human activities
- bag of words
- space time interest points
- activity recognition
- spatio temporal interest points
- spatial temporal
- action detection
- motion features
- action recognition in videos
- mid level features
- depth sensors
- motion history images
- body parts
- audio visual
- action primitives
- high level
- video sequences
- spatio temporal
- human object interactions
- video frames
- motion capture data
- multi modal
- space time
- bag of features