Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning.

AJ Piergiovanni Weicheng Kuo Anelia Angelova

Published in: CoRR (2022)

Keyphrases

video sequences
video images
video data
real time
multimedia
visual data
video content
static images
image frames
video frames
learning algorithm
object motion
video analysis
video streams
single image
key frames
image classification
video files
image collections
video clips
segmentation method
image representation
input image
temporal continuity
images and video sequences
weakly labeled