MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition.
Chao-Yuan WuYanghao LiKarttikeya MangalamHaoqi FanBo XiongJitendra MalikChristoph FeichtenhoferPublished in: CoRR (2022)
Keyphrases
- long term
- multiscale
- short term
- image processing
- object recognition
- recognition rate
- real time
- video sequences
- computer vision
- character recognition
- wavelet transform
- human activities
- neural network
- space time
- fuzzy logic
- recognition accuracy
- scale space
- video data
- image segmentation
- fault diagnosis
- activity recognition
- video streams
- vision system
- moving objects
- video surveillance
- recognition algorithm
- memory usage
- memory space
- limited memory