Login / Signup
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text.
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
Published in:
NeurIPS (2021)
Keyphrases
</>
multimedia
learning process
real time
learning algorithm
digital video
information retrieval
supervised learning
online learning
video data
audio video
multimodal fusion
reinforcement learning
video sequences
signal processing
multi modal