Towards Training Stronger Video Vision Transformers for EPIC-KITCHENS-100 Action Recognition.
Ziyuan HuangZhiwu QingXiang WangYutong FengShiwei ZhangJianwen JiangZhurong XiaMingqian TangNong SangMarcelo H. Ang Jr.Published in: CoRR (2021)
Keyphrases
- action recognition
- human actions
- action classification
- computer vision
- spatial temporal
- action detection
- video dataset
- recognizing human actions
- recognition of human actions
- motion features
- static images
- bag of words
- human activities
- activity recognition
- spatio temporal interest points
- space time interest points
- human detection
- video sequences
- motion history images
- recognizing actions
- mid level
- human pose
- vision system
- action primitives
- body parts
- video data
- space time
- multimedia
- depth sensors
- view invariant
- human activity recognition
- video content
- event detection
- key frames
- event recognition
- video surveillance
- spatio temporal
- action recognition in videos
- video clips
- video shots