Text-to-Multimodal Retrieval with Bimodal Input Fusion in Shared Cross-Modal Transformer.
Pranav AroraSelen PehlivanJorma LaaksonenPublished in: LREC/COLING (2024)
Keyphrases
- cross modal
- multi modal
- multiple modalities
- multimedia retrieval
- text retrieval
- image retrieval
- multimedia databases
- visual similarity
- information retrieval
- text mining
- content based retrieval
- visual recognition
- keywords
- semantic content
- multimedia documents
- multimedia information retrieval
- web images
- image data
- language model
- information retrieval systems
- natural language processing
- metadata