Publication: Geometry-Entangled Visual Semantic Transformer for Image Captioning.