IPA-CLIP: Integrating Phonetic Priors into Vision and Language Pretraining.
Chihaya MatsuhiraMarc A. KastnerTakahiro KomamizuTakatsugu HirayamaKeisuke DomanYasutomo KawanishiIchiro IdePublished in: CoRR (2023)
Keyphrases
- programming language
- language learning
- real time
- database
- computer vision
- image processing
- learned from training data
- language processing
- vision system
- speech recognition
- prior knowledge
- natural language
- specification language
- spoken language
- object oriented programming
- computational linguistics
- visual field
- neural network