Entity-aware and Motion-aware Transformers for Language-driven Action Localization in Videos.

Shuo Yang Xinxiao Wu

Published in: CoRR (2022)

Keyphrases

recognizing human actions
recognition of human actions
human motion
human actions
natural language
motion estimation
moving objects
dynamic scenes
video sequences
action classification
moving camera
video surveillance
image sequences
key frames
space time
language learning
spatial and temporal
static images
motion features
position information
input video
traffic scenes
human body
pose estimation