Login / Signup
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
Published in:
CoRR (2023)
Keyphrases
</>
data quality
multimedia
video sequences
hidden markov models