Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos.
Shuo YangXinxiao WuPublished in: CoRR (2022)
Keyphrases
- recognizing human actions
- recognition of human actions
- human motion
- human actions
- natural language
- motion estimation
- moving objects
- dynamic scenes
- video sequences
- action classification
- moving camera
- video surveillance
- image sequences
- key frames
- space time
- language learning
- spatial and temporal
- static images
- motion features
- position information
- input video
- traffic scenes
- human body
- pose estimation