Towards Efficient Cross-Modal Visual Textual Retrieval using Transformer-Encoder Deep Features.
Nicola MessinaGiuseppe AmatoFabrizio FalchiClaudio GennaroStéphane Marchand-MailletPublished in: CoRR (2021)
Keyphrases
- cross modal
- multi modal
- multimedia retrieval
- perceptual information
- image retrieval
- low level
- multimedia databases
- multimedia
- visual data
- feature vectors
- visual recognition
- visual similarity
- feature space
- information retrieval
- image features
- co occurrence
- visual features
- feature extraction
- image content
- multimedia information retrieval
- content based retrieval
- object recognition
- image data
- high dimensional