Sign in

Audio-Visual Speech Recognition is Worth $32\times 32\times 8$ Voxels.

Dmitriy SerdyukOtavio BragaOlivier Siohan
Published in: ASRU (2021)
Keyphrases
  • motion estimation
  • audio visual speech recognition
  • multimedia