See, move and hear: a local-to-global multi-modal interaction network for video action recognition.
Fan FengYue MingNannan HuJiangwan ZhouPublished in: Appl. Intell. (2023)
Keyphrases
- multi modal
- action recognition
- human actions
- action classification
- spatial temporal
- video dataset
- action detection
- semantic concepts
- video database
- recognition of human actions
- recognizing human actions
- video search
- static images
- bag of words
- space time interest points
- human activities
- video sequences
- high dimensional
- activity recognition
- recognizing actions
- motion history images
- audio visual
- motion features
- computer vision
- video data
- video frames
- multi modality
- spatio temporal
- multiple modalities
- spatial and temporal
- humanoid robot
- human pose
- global features
- video shots
- visual data
- multimedia
- feature selection
- uni modal
- image annotation