Login / Signup
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models.
Khazar Khorrami
Okko Räsänen
Published in:
CoRR (2021)
Keyphrases
</>
audio visual
multi modal
visual information
multi stream
visual data
emotion recognition
speaker verification
audio visual speech recognition
temporal context