Efficient Video Transformers via Spatial-temporal Token Merging for Action Recognition.
Zhanzhou FengJiaming XuLei MaShiliang ZhangPublished in: ACM Trans. Multim. Comput. Commun. Appl. (2024)
Keyphrases
- spatial temporal
- action recognition
- human actions
- bag of words
- activity recognition
- action classification
- video shots
- spatio temporal
- recognition of human actions
- recognizing human actions
- video database
- computer vision
- action detection
- video dataset
- space time interest points
- static images
- human activities
- motion features
- video retrieval
- temporal information
- image classification
- video sequences