Publication: Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images.