SITTA: A Semantic Image-Text Alignment for Image Captioning.
Fabian PaischerThomas AdlerMarkus HofmarcherSepp HochreiterPublished in: CoRR (2023)
Keyphrases
- image classification
- input image
- image retrieval
- image features
- image content
- single image
- image data
- image alignment
- image segmentation
- low level
- high resolution
- multiscale
- image regions
- image representation
- template matching
- feature points
- similarity measure
- image matching
- test images
- region of interest
- visual features
- pixel values
- semantic space
- image pixels
- scanned documents
- text information
- spatial information
- hough transform
- segmentation method
- segmentation algorithm
- super resolution
- scale space
- image quality
- image database
- natural language
- feature extraction