Multimodal video-text matching using a deep bifurcation network and joint embedding of visual and textual features.

Masoomeh NabatiAlireza Behrad
Published in: Expert Syst. Appl. (2021)
Keyphrases