Enhancing Multimodal Understanding with CLIP-Based Image-to-Text Transformation.
Chang CheQunwei LinXinyu ZhaoJiaxin HuangLiqiang YuPublished in: ICBDT (2023)
Keyphrases
- image data
- input image
- image analysis
- image features
- image content
- image representation
- image classification
- multiscale
- image retrieval
- image transformations
- template matching
- edge detection
- segmentation method
- image pixels
- single image
- test images
- text retrieval
- information retrieval
- keywords
- image segmentation
- multi modal
- web images
- low level
- text information
- hough transform
- video clips
- region of interest
- multimedia
- textual information
- handwritten words
- text graphics
- image collections
- image set
- image matching
- image regions
- visual features
- text mining
- multiresolution
- object recognition
- similarity measure
- image processing