Publication: Multimodal video-text matching using a deep bifurcation network and joint embedding of visual and textual features.