GIT: A Generative Image-to-text Transformer for Vision and Language.
Jianfeng WangZhengyuan YangXiaowei HuLinjie LiKevin LinZhe GanZicheng LiuCe LiuLijuan WangPublished in: CoRR (2022)
Keyphrases
- image data
- input image
- multiscale
- image representation
- image features
- feature points
- image retrieval
- image regions
- image classification
- visual perception
- computer vision
- test images
- image content
- spatial information
- region of interest
- single image
- image analysis
- image pixels
- generative model
- text mining
- image collections
- low level
- high resolution
- hough transform
- low level image processing
- web images
- english text
- textual and visual information
- segmentation method
- segmentation algorithm
- edge detection
- programming language
- fuzzy logic
- keywords
- similarity measure
- image segmentation