Sign in

Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Muti-Person Video.

Dmitriy SerdyukOtavio BragaOlivier Siohan
Published in: INTERSPEECH (2022)
Keyphrases
  • video sequences
  • video content
  • video data
  • video streams
  • multimedia
  • video frames
  • video retrieval
  • audio visual speech recognition
  • video analysis
  • probabilistic model
  • compressed video