Login / Signup

Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.

Jiachen LianAlexei BaevskiWei-Ning HsuMichael Auli
Published in: ASRU (2023)
Keyphrases
  • multimedia
  • image data
  • multiscale
  • hidden markov models
  • visual data
  • extracted features
  • speaker identification