BEVT: BERT Pretraining of Video Transformers.
Rui WangDongdong ChenZuxuan WuYinpeng ChenXiyang DaiMengchen LiuYu-Gang JiangLuowei ZhouLu YuanPublished in: CVPR (2022)
Keyphrases
- video sequences
- multimedia
- video data
- video frames
- video content
- video database
- video streams
- space time
- video clips
- multimedia data
- digital video
- multi agent
- spatial temporal
- key frames
- neural network
- video surveillance
- data sets
- spatial and temporal
- video images
- real time video
- online video
- video retrieval
- event detection
- database
- multi modal
- case study
- real time