Sign in

Cascaded Multilingual Audio-Visual Learning from Videos.

Andrew RouditchenkoAngie W. BoggustDavid HarwathSamuel ThomasHilde KuehneBrian ChenRameswar PandaRogério FerisBrian KingsburyMichael PichenyJames R. Glass
Published in: Interspeech (2021)
Keyphrases
  • audio visual
  • multi modal
  • search engine
  • knowledge base
  • contextual information
  • visual information
  • visual data
  • temporal context