Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning.
AJ PiergiovanniWeicheng KuoAnelia AngelovaPublished in: CoRR (2022)
Keyphrases
- video sequences
- video images
- video data
- real time
- multimedia
- visual data
- video content
- static images
- image frames
- video frames
- learning algorithm
- object motion
- video analysis
- video streams
- single image
- key frames
- image classification
- video files
- image collections
- video clips
- segmentation method
- image representation
- input image
- temporal continuity
- images and video sequences
- weakly labeled