Login / Signup
Av-Data2Vec: Self-Supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations.
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
Published in:
ASRU (2023)
Keyphrases
</>
multimedia
image data
multiscale
hidden markov models
visual data
extracted features
speaker identification