AVLnet: Learning Audio-Visual Language Representations from Instructional Videos.
Andrew RouditchenkoAngie W. BoggustDavid HarwathBrian ChenDhiraj JoshiSamuel ThomasKartik AudhkhasiHilde KuehneRameswar PandaRogério Schmidt FerisBrian KingsburyMichael PichenyAntonio TorralbaJames R. GlassPublished in: Interspeech (2021)