Sign in
Cascaded Multilingual Audio-Visual Learning from Videos.
Andrew Rouditchenko
Angie W. Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
Brian Chen
Rameswar Panda
Rogério Feris
Brian Kingsbury
Michael Picheny
James R. Glass
Published in:
CoRR (2021)
Keyphrases
</>
audio visual
visual information
data sets
e learning
three dimensional
object recognition
text mining
image database
multi modal