Login / Signup

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval.

Layne BerryYi-Jen ShihHsuan-Fu WangHeng-Jui ChangHung-yi LeeDavid Harwath
Published in: CoRR (2022)
Keyphrases
  • image retrieval
  • pre trained
  • data sets
  • neural network
  • image database
  • prior knowledge
  • image matching
  • speech signal