Multimodal Pretraining Unmasked: Unifying the Vision and Language BERTs.
Emanuele BugliarelloRyan CotterellNaoaki OkazakiDesmond ElliottPublished in: CoRR (2020)
Keyphrases
- computer vision
- programming language
- real time
- language learning
- vision system
- multi modal
- natural language
- database
- multimodal interfaces
- image processing
- multimedia
- natural language processing
- information systems
- data mining
- context dependent
- language processing
- representation language
- specification language
- linguistic knowledge
- english language