Multi-grained encoding and joint embedding space fusion for video and text cross-modal retrieval.
Xiaotao CuiJing XiaoYang CaoJia ZhuPublished in: Multim. Tools Appl. (2022)
Keyphrases
- cross modal
- multimedia retrieval
- multi modal
- visual data
- text retrieval
- multimedia documents
- multimedia databases
- multimedia
- multimedia data
- image retrieval
- information retrieval
- video sequences
- visual similarity
- semantic concepts
- content based retrieval
- video data
- video frames
- text data
- keywords
- text mining
- video content
- low dimensional
- visual information
- text documents
- manifold learning
- video retrieval
- feature extraction
- visual features
- spatio temporal
- dimensionality reduction
- feature space
- high dimensional
- relevance feedback
- human activities
- space time
- text classification