SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training.

Yuanze Lin Chen Wei Huiyu Wang Alan L. Yuille Cihang Xie

Published in: CoRR (2022)

Keyphrases

multimedia
training process
training set
video data
data sets
restricted boltzmann machine
video analysis
video content
video streams
decision trees
optical flow
active learning
supervised learning
artificial neural networks
natural language
temporal information
language learning
computer vision
neural network
real time