Login / Signup
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers.
Yasheng Sun
Hang Zhou
Kaisiyuan Wang
Qianyi Wu
Zhibin Hong
Jingtuo Liu
Errui Ding
Jingdong Wang
Ziwei Liu
Hideki Koike
Published in:
CoRR (2022)
Keyphrases
</>
audio visual
audio visual speech recognition
multi modal
visual information
multi stream
visual data
multimedia
emotion recognition
contextual information
video summarization
person authentication
image data
temporal context
computer vision
high level
image representation