Publication: Hierarchical Multimodal Attention Network Based on Semantically Textual Guidance for Video Captioning.