Dual-adaptive interactive transformer with textual and visual context for image captioning.
Lizhi ChenKesen LiPublished in: Expert Syst. Appl. (2024)
Keyphrases
- visual context
- input image
- image data
- image classification
- single image
- image features
- image content
- image segmentation
- temporal context
- image retrieval
- image representation
- multiscale
- object detection
- scene interpretation
- multi modal
- computer graphics
- spatial information
- edge map
- high level
- semantic context
- multimedia