Towards Versatile and Efficient Visual Knowledge Injection into Pre-trained Language Models with Cross-Modal Adapters.
Xinyun ZhangHaochen TanHan WuMingjie ZhanDing LiangBei YuPublished in: CoRR (2023)
Keyphrases
- cross modal
- language model
- multi modal
- language modeling
- pre trained
- n gram
- document retrieval
- probabilistic model
- speech recognition
- visual data
- image retrieval
- multimedia retrieval
- information retrieval
- retrieval model
- visual similarity
- query expansion
- test collection
- multimedia databases
- document collections
- machine learning
- low level
- computer vision