SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training.
Yuanze LinChen WeiHuiyu WangAlan L. YuilleCihang XiePublished in: CoRR (2022)
Keyphrases
- multimedia
- training process
- training set
- video data
- data sets
- restricted boltzmann machine
- video analysis
- video content
- video streams
- decision trees
- optical flow
- active learning
- supervised learning
- artificial neural networks
- natural language
- temporal information
- language learning
- computer vision
- neural network
- real time