Transformer with multi-level grid features and depth pooling for image captioning.
Doanh C. BuiTam V. NguyenKhang NguyenPublished in: Mach. Vis. Appl. (2024)
Keyphrases
- image features
- low level
- image data
- test images
- extracted features
- extracting features
- image regions
- image classification
- image segmentation
- image representation
- input image
- spatial distribution
- single image
- image description
- multiscale
- invariant features
- feature representation
- image pixels
- keypoints
- original images
- high resolution
- image analysis
- spatial pooling
- matching process
- grey level
- global features
- fault diagnosis
- feature vectors
- image content
- feature extraction
- similarity measure
- feature values
- feature detectors
- salient features
- image matching
- sample images
- feature space
- image retrieval
- feature points
- spatial information
- spatial relationships
- image set
- region of interest
- fuzzy logic
- intensity images
- image restoration
- geometric constraints
- feature descriptors
- object recognition
- textural features
- relative depth
- feature selection