Co-training Transformer with Videos and Images Improves Action Recognition.
Bowen ZhangJiahui YuChristopher FiftyWei HanAndrew M. DaiRuoming PangFei ShaPublished in: CoRR (2021)
Keyphrases
- action recognition
- video dataset
- static images
- human actions
- co training
- action classification
- recognition of human actions
- view invariant
- image data
- low level descriptors
- recognizing human actions
- computer vision
- activity recognition
- recognizing actions
- input image
- bag of words
- action detection
- human activities
- semi supervised learning
- small number
- action recognition in videos
- image classification
- image retrieval
- three dimensional
- single view
- semi supervised
- similarity measure
- video frames
- unlabeled data
- text classification
- object recognition
- named entities
- human pose
- multi view
- human object interactions
- video sequences