What, when, and where? - Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions.
Brian ChenNina ShvetsovaAndrew RouditchenkoDaniel KondermannSamuel ThomasShih-Fu ChangRogério FerisJames R. GlassHilde KuehnePublished in: CoRR (2023)
Keyphrases
- human actions
- spatio temporal
- action recognition
- spatio temporal interest points
- view invariant
- spatio temporal patterns
- action classification
- space time
- recognition of human actions
- video database
- video representation
- human motion
- video sequences
- activity recognition
- human activities
- recognizing human actions
- spatial temporal
- recognizing actions
- action sequences
- visual features
- moving objects
- dynamic scenes
- video surveillance
- motion trajectories
- motion features
- spatio temporal data
- temporal segmentation
- action recognition in videos
- multi view
- computer vision
- spatio temporally
- space time interest points
- temporal domain
- video clips
- spatial and temporal
- reinforcement learning
- three dimensional