Login / Signup
VGGSound: A Large-scale Audio-Visual Dataset.
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
Published in:
CoRR (2020)
Keyphrases
</>
audio visual
multi modal
visual information
visual data
audio visual speech recognition
emotion recognition
multi stream
multimedia
video summarization
temporal context
person authentication
database
data analysis
multimodal fusion
feature set
image retrieval
human actions
data sets