Publication: Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval.