Login / Signup
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling.
Shentong Mo
Pedro Morgado
Published in:
CoRR (2023)
Keyphrases
</>
audio visual
person authentication
multi modal
multimodal fusion
visual data
multimedia
visual information
temporal context
databases
emotion recognition
multi stream
three dimensional
low level
audio visual speech recognition