XGPT: Cross-modal Generative Pre-Training for Image Captioning.
Qiaolin XiaHaoyang HuangNan DuanDongdong ZhangLei JiZhifang SuiEdward CuiTaroon BhartiMing ZhouPublished in: NLPCC (1) (2021)
Keyphrases
- cross modal
- image features
- image data
- image retrieval
- image content
- image representation
- image classification
- image segmentation
- image collections
- multiscale
- multi modal
- visual similarity
- visual data
- supervised learning
- spatial information
- search engine
- multimedia retrieval
- perceptual information
- semantic gap
- image understanding
- generative model
- training set
- high level