MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition.
Jiawei ChenChiu Man HoPublished in: WACV (2022)
Keyphrases
- multi modal
- action recognition
- compressed video
- video streams
- video content
- human actions
- video quality
- action classification
- video frames
- super resolution
- activity recognition
- compressed domain
- human detection
- spatial temporal
- computer vision
- data hiding
- video database
- high dimensional
- video search
- human activities
- audio visual
- bitstream
- body parts
- semantic concepts
- video data
- humanoid robot
- feature selection
- high resolution
- similarity measure
- multiscale
- machine learning