Login / Signup

AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.

Jiachen LianAlexei BaevskiWei-Ning HsuMichael Auli
Published in: CoRR (2023)
Keyphrases
  • data quality
  • multimedia
  • video sequences
  • hidden markov models