Login / Signup
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation.
Jing-Xuan Zhang
Genshun Wan
Zhen-Hua Ling
Jia Pan
Jianqing Gao
Cong Liu
Published in:
CoRR (2022)
Keyphrases
</>
audio visual
multi modal
multi stream
visual information
multimedia
visual data
speaker verification
data sets
audio visual speech recognition
principal component analysis
nearest neighbor
hidden markov models
affective states
emotion recognition
multimodal interfaces
person authentication
multimodal fusion