Multimodal video-text matching using a deep bifurcation network and joint embedding of visual and textual features.
Masoomeh NabatiAlireza BehradPublished in: Expert Syst. Appl. (2021)
Keyphrases
- multimedia
- multi modal
- text mining
- visual and textual features
- natural language descriptions
- video search
- string matching
- multiple modalities
- multimedia data
- information retrieval
- video data
- video sequences
- matching algorithm
- image matching
- video streams
- network structure
- visual features
- video content
- image retrieval
- text detection
- image search
- higher level
- feature extraction
- databases