Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning.
Xu YangHanwang ZhangChongyang GaoJianfei CaiPublished in: CoRR (2022)
Keyphrases
- learning process
- input image
- low level
- multiscale
- learning algorithm
- auto annotation
- neural network
- single image
- image data
- image segmentation
- image classification
- visual perception
- image features
- segmentation algorithm
- network architecture
- visual data
- image content
- spatial information
- perceptual information
- visual appearance
- visual cues
- image collections
- keypoints
- segmentation method
- visual information
- similarity measure
- image pixels
- image representation
- web images
- learning rules
- visual features
- natural images
- visual effects
- edge detection
- high level