Login / Signup
Audio-Visual Transformer Based Crowd Counting.
Usman Sajid
Xiangyu Chen
Hasan Sajid
Taejoon Kim
Guanghui Wang
Published in:
CoRR (2021)
Keyphrases
</>
audio visual
multi modal
visual information
visual data
video summarization
multimedia
temporal context
emotion recognition
multi stream
domain knowledge
person authentication
audio visual speech recognition
audio visual content
data sets
knowledge base
multiscale