Login / Signup
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers.
Yasheng Sun
Hang Zhou
Kaisiyuan Wang
Qianyi Wu
Zhibin Hong
Jingtuo Liu
Errui Ding
Jingdong Wang
Ziwei Liu
Hideki Koike
Published in:
SIGGRAPH Asia (2022)
Keyphrases
</>
audio visual
audio visual speech recognition
multi modal
visual information
multi stream
visual data
multimedia
emotion recognition
video summarization
contextual information
temporal context
person authentication
image data
multimodal fusion
data sets
audio features