Login / Signup
Cascaded Multilingual Audio-Visual Learning from Videos.
Andrew Rouditchenko
Angie W. Boggust
David Harwath
Samuel Thomas
Hilde Kuehne
Brian Chen
Rameswar Panda
Rogério Feris
Brian Kingsbury
Michael Picheny
James R. Glass
Published in:
Interspeech (2021)
Keyphrases
</>
audio visual
multi modal
search engine
knowledge base
contextual information
visual information
visual data
temporal context