XGPT: Cross-modal Generative Pre-Training for Image Captioning.
Qiaolin XiaHaoyang HuangNan DuanDongdong ZhangLei JiZhifang SuiEdward CuiTaroon BhartiXin LiuMing ZhouPublished in: CoRR (2020)
Keyphrases
- cross modal
- image retrieval
- image classification
- image content
- image data
- multiscale
- image features
- multi modal
- visual similarity
- test images
- image segmentation
- visual data
- web images
- image representation
- image collections
- perceptual information
- image set
- supervised learning
- low level
- similarity measure
- spatial relationships
- semantic gap
- metadata
- information retrieval