Login / Signup
Audio-Visual Transformer Based Crowd Counting.
Usman Sajid
Xiangyu Chen
Hasan Sajid
Taejoon Kim
Guanghui Wang
Published in:
ICCVW (2021)
Keyphrases
</>
audio visual
multi modal
visual data
visual information
video summarization
multimedia
temporal context
emotion recognition
person authentication
audio visual speech recognition
multi stream
e learning
domain knowledge
multimodal fusion