SimVTP: Simple Video Text Pre-training with Masked Autoencoders.
Yue MaTianyu YangYin ShanXiu LiPublished in: CoRR (2022)
Keyphrases
- multimedia
- real time
- video data
- video sequences
- database
- video frames
- text mining
- natural language descriptions
- information retrieval
- denoising
- hidden markov models
- training examples
- text documents
- text retrieval
- video content
- free text
- video database
- text classifiers
- feedforward neural networks
- video images
- neural network