Login / Signup
Self-Supervised Audio-Visual Speech Representations Learning by Multimodal Self-Distillation.
Jing-Xuan Zhang
Genshun Wan
Zhen-Hua Ling
Jia Pan
Jianqing Gao
Cong Liu
Published in:
ICASSP (2023)
Keyphrases
</>
audio visual
multi modal
visual information
multi stream
multimedia
data sets
speaker verification
audio visual speech recognition
computer vision
visual data
three dimensional
image data
multimedia data
audio features
multimodal fusion