Login / Signup
Vggsound: A Large-Scale Audio-Visual Dataset.
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
Published in:
ICASSP (2020)
Keyphrases
</>
audio visual
multi modal
visual information
video summarization
temporal context
multimedia
emotion recognition
audio visual speech recognition
visual data
multi stream
person authentication
multiscale
database
metadata
domain knowledge
image data