Login / Signup
Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English.
Xiaoming Ren
Chao Li
Shenjian Wang
Biao Li
Published in:
CoRR (2023)
Keyphrases
</>
audio visual
emotion recognition
multi modal
visual information
multi stream
visual data
temporal context
person authentication
speech recognition
speaker verification
audio visual speech recognition
data sets
multimedia
broadcast news
three dimensional
natural language