Publication: A Hierarchical Framwork with Improved Loss for Large-scale Multi-modal Video Identification.