GIT: A Generative Image-to-text Transformer for Vision and Language.
Jianfeng WangZhengyuan YangXiaowei HuLinjie LiKevin LinZhe GanZicheng LiuCe LiuLijuan WangPublished in: Trans. Mach. Learn. Res. (2022)
Keyphrases
- input image
- image features
- image segmentation
- multiscale
- image classification
- image data
- image content
- low level
- single image
- image matching
- visual perception
- low level image processing
- language generation
- test images
- image pixels
- computational linguistics
- segmentation method
- text mining
- high resolution
- image retrieval
- similarity measure
- information retrieval
- image representation
- spatial information
- region of interest
- text retrieval
- text to speech
- textual and visual information
- high level
- image sequences
- english text
- object recognition
- web images
- image collections
- generative model
- image regions