Login / Signup
Audio-visual fine-tuning of audio-only ASR models.
Avner May
Dmitriy Serdyuk
Ankit Parag Shah
Otavio Braga
Olivier Siohan
Published in:
CoRR (2023)
Keyphrases
</>
audio visual
fine tuning
multi modal
visual information
visual data
multimedia
multi stream
audio visual speech recognition
multimodal fusion
emotion recognition
audio features
temporal context
computer vision
person authentication
semantic information
hidden markov models
audio visual content