Login / Signup
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Muti-Person Video.
Dmitriy Serdyuk
Otavio Braga
Olivier Siohan
Published in:
INTERSPEECH (2022)
Keyphrases
</>
video sequences
video content
video data
video streams
multimedia
video frames
video retrieval
audio visual speech recognition
video analysis
probabilistic model
compressed video