Login / Signup
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.
Bowen Shi
Abdelrahman Mohamed
Wei-Ning Hsu
Published in:
CoRR (2022)
Keyphrases
</>
audio visual
multi modal
visual information
speaker verification
audio visual speech recognition
multimedia
multi stream
visual data
person authentication
emotion recognition
data sets
machine learning
spatio temporal
co occurrence
human motion
temporal context