Publication: Multimodal Transformer with Multi-View Visual Representation for Image Captioning.