VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning.

Published in: CoRR (2022)

Keyphrases