Login / Signup
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction.
Bowen Shi
Wei-Ning Hsu
Kushal Lakhotia
Abdelrahman Mohamed
Published in:
ICLR (2022)
Keyphrases
</>
audio visual
multi modal
multi stream
visual information
emotion recognition
multimodal fusion
multimedia
visual data
computer vision
domain knowledge
speech recognition
affective states
audio features
person authentication
audio visual speech recognition