VLDeformer: Vision-Language Decomposed Transformer for fast cross-modal retrieval.
Lisai ZhangHongfa WuQingcai ChenYimeng DengJoanna SiebertZhonghua LiYunpeng HanDejiang KongZhao CaoPublished in: Knowl. Based Syst. (2022)
Keyphrases
- cross modal
- multi modal
- multimedia retrieval
- image retrieval
- visual similarity
- multimedia databases
- computer vision
- visual recognition
- information retrieval
- multimedia
- perceptual information
- visual data
- relevance feedback
- multimedia information retrieval
- feature selection
- image understanding
- document retrieval
- similarity search
- image database
- natural language