MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition.
Chao-Yuan WuYanghao LiKarttikeya MangalamHaoqi FanBo XiongJitendra MalikChristoph FeichtenhoferPublished in: CVPR (2022)
Keyphrases
- long term
- multiscale
- image processing
- computer vision
- limited memory
- real time
- object recognition
- recognition rate
- visual perception
- video sequences
- short term
- image representation
- automatic recognition
- memory usage
- video images
- space time
- fault diagnosis
- video streams
- video content
- gesture recognition
- character recognition
- video surveillance
- coarse to fine
- multimedia
- memory requirements
- human activities
- action recognition
- video data
- object detection
- fuzzy logic
- wavelet transform
- image sequences
- feature extraction
- computational complexity
- pattern recognition