Publication: Transform, contrast and tell: Coherent entity-aware multi-image captioning.