Can Decoupling Embedded Text from Images Improve Multimodal Learning?
Siddhant Bikram ShahPublished in: Tiny Papers @ ICLR (2024)
Keyphrases
- input image
- image data
- ground truth
- image features
- image database
- image analysis
- information retrieval
- supervised learning
- multi modal
- image classification
- learning algorithm
- image collections
- text retrieval
- feature points
- edge detection
- learning process
- three dimensional
- input output
- test images
- image annotation
- complex background
- text information
- linear predictors
- text extraction