MaskViT: Masked Visual Pre-Training for Video Prediction.
Agrim GuptaStephen TianYunzhi ZhangJiajun WuRoberto Martín-MartínLi Fei-FeiPublished in: CoRR (2022)
Keyphrases
- prediction accuracy
- visual cues
- visual data
- visual analysis
- video data
- video sequences
- content based video retrieval
- video streams
- training samples
- training set
- visual information
- news video
- video database
- multi layer perceptron
- multimedia data
- video analysis
- radial basis function network
- real time
- video content
- space time
- supervised learning
- multimedia
- prediction model
- video frames
- prediction algorithm
- training phase
- visual perception
- scalable video coding
- video clips
- video search
- training process
- prediction error
- video retrieval
- key frames
- spatio temporal
- learning algorithm