AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.

Published in: CoRR (2023)

Keyphrases