Login / Signup
Vision Transformers are Parameter-Efficient Audio-Visual Learners.
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Mohit Bansal
Gedas Bertasius
Published in:
CVPR (2023)
Keyphrases
</>
audio visual
multi modal
multi stream
computer vision
visual information
emotion recognition
person authentication
e learning
image processing
visual data
temporal context
audio visual speech recognition