Login / Signup
Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT.
Bowen Shi
Abdelrahman Mohamed
Wei-Ning Hsu
Published in:
INTERSPEECH (2022)
Keyphrases
</>
audio visual
multi modal
multimedia
visual information
audio visual speech recognition
computer vision
speaker verification
person authentication
three dimensional
feature extraction
domain knowledge
co occurrence
visual data
emotion recognition
multi stream