Login / Signup
Learning Contextually Fused Audio-Visual Representations For Audio-Visual Speech Recognition.
Ziqiang Zhang
Jie Zhang
Jian-Shu Zhang
Ming-Hui Wu
Xin Fang
Lirong Dai
Published in:
ICIP (2022)
Keyphrases
</>
audio visual
audio visual speech recognition
multi stream
multi modal
multiscale
keywords
image features
visual information
emotion recognition