Publication: VICTR: Visual Information Captured Text Representation for Text-to-Vision Multimodal Tasks.