Layer-wise enhanced transformer with multi-modal fusion for image caption.
Jingdan LiYi WangDexin ZhaoPublished in: Multim. Syst. (2023)
Keyphrases
- image classification
- input image
- image analysis
- image representation
- image content
- image features
- image data
- single image
- image retrieval
- multiscale
- test images
- image segmentation
- caption text
- image pixels
- high resolution
- edge detection
- feature points
- data sets
- segmentation algorithm
- fault diagnosis
- image regions
- bit rate
- image structure
- low level
- middle layer
- similarity measure
- multi modal fusion