Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.

Published in: ASRU (2023)

Keyphrases