SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition.
Liutao YuLiwei HuangChenlin ZhouHan ZhangZhengyu MaHuihui ZhouYonghong TianPublished in: CoRR (2024)
Keyphrases
- action recognition
- human actions
- action classification
- spatial temporal
- action detection
- video dataset
- bag of words
- recognition of human actions
- recognizing human actions
- static images
- human detection
- human activities
- motion features
- activity recognition
- space time interest points
- computer vision
- body parts
- mid level
- recognizing actions
- video data
- motion history images
- atomic actions
- video sequences
- spatio temporal
- bag of features
- space time
- depth sensors
- temporal information
- view invariant
- video content
- action recognition in videos
- video shots