Sign in

Learning Contextually Fused Audio-Visual Representations For Audio-Visual Speech Recognition.

Ziqiang ZhangJie ZhangJian-Shu ZhangMing-Hui WuXin FangLirong Dai
Published in: ICIP (2022)
Keyphrases
  • audio visual
  • audio visual speech recognition
  • multi stream
  • multi modal
  • multiscale
  • keywords
  • image features
  • visual information
  • emotion recognition