AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Andrew RouditchenkoAngie W. BoggustDavid HarwathDhiraj JoshiSamuel ThomasKartik AudhkhasiRogério FerisBrian KingsburyMichael PichenyAntonio TorralbaJames R. GlassPublished in: CoRR (2020)