VatLM: Visual-Audio-Text Pre-Training With Unified Masked Prediction for Speech Representation Learning.

Published in: IEEE Trans. Multim. (2024)

Keyphrases