Integrating Text and Image Pre-training for Multi-modal Algorithmic Reasoning.
Zijian ZhangWei LiuPublished in: CoRR (2024)
Keyphrases
- multi modal
- multiple modalities
- auto annotation
- multiscale
- uni modal
- image data
- image features
- multi modality
- single modality
- input image
- web images
- video search
- image representation
- image analysis
- image retrieval
- fusing multiple
- image segmentation
- image annotation
- image content
- audio visual
- semantic concepts
- edge detection
- low level
- segmentation method
- image classification
- face recognition
- video content
- image collections
- cross modal
- segmentation algorithm
- multimedia