Login / Signup
Vision Transformers are Parameter-Efficient Audio-Visual Learners.
Yan-Bo Lin
Yi-Lin Sung
Jie Lei
Mohit Bansal
Gedas Bertasius
Published in:
CoRR (2022)
Keyphrases
</>
emotion recognition
audio visual
multi modal
visual information
video summarization
visual data
temporal context
multimedia
multi stream
audio visual speech recognition
pattern recognition
wordnet