Login / Signup
Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models.
Khazar Khorrami
Okko Räsänen
Published in:
Interspeech (2021)
Keyphrases
</>
audio visual
multi modal
visual data
multi stream
visual information
emotion recognition
multimedia
audio features
high level
temporal context
audio visual speech recognition
computer vision
contextual information
person authentication