Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning.
A. J. PiergiovanniWeicheng KuoAnelia AngelovaPublished in: CVPR (2023)
Keyphrases
- video data
- multimedia
- video sequences
- video analysis
- real time
- image features
- visual data
- visual cues
- video content
- key frames
- multimedia data
- video streams
- video images
- multiscale
- image classification
- video retrieval
- image frames
- space time
- image data
- low level
- images and video sequences
- image segmentation
- image processing
- video files
- weakly labeled
- visual information
- image regions
- segmentation method
- image representation
- feature vectors
- image retrieval
- high dimensional