Expanding Large Pre-trained Unimodal Models with Multimodal Information Injection for Image-Text Multimodal Classification.
Tao LiangGuosheng LinMingyang WanTianrui LiGuojun MaFengmao LvPublished in: CVPR (2022)
Keyphrases
- pre trained
- multimodal information
- image classification
- single image
- image features
- multiscale
- image content
- input image
- image retrieval
- training data
- visual data
- image representation
- machine learning
- image data
- neural network
- principal component analysis
- text classification
- low level
- image regions
- support vector
- small number
- feature points
- video data
- training examples
- object recognition
- decision trees