BEVT: BERT Pretraining of Video Transformers.

Rui Wang Dongdong Chen Zuxuan Wu Yinpeng Chen Xiyang Dai Mengchen Liu Yu-Gang Jiang Luowei Zhou Lu Yuan

Published in: CVPR (2022)

Keyphrases

video sequences
multimedia
video data
video frames
video content
video database
video streams
space time
video clips
multimedia data
digital video
multi agent
spatial temporal
key frames
neural network
video surveillance
data sets
spatial and temporal
video images
real time video
online video
video retrieval
event detection
database
multi modal
case study
real time