Login / Signup
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning.
Qiu-Shi Zhu
Long Zhou
Ziqiang Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
Published in:
CoRR (2022)
Keyphrases
</>
supervised learning
online learning
visual information
learning algorithm
learning process
training set
text mining
audio visual
text to speech
information retrieval
audio stream
text graphics
content based video retrieval
audio signals
visual representation
prediction accuracy
learning environment
reinforcement learning