Fine-grained Visual Textual Alignment for Cross-Modal Retrieval using Transformer Encoders.
Nicola MessinaGiuseppe AmatoAndrea EsuliFabrizio FalchiClaudio GennaroStéphane Marchand-MailletPublished in: CoRR (2020)
Keyphrases
- cross modal
- fine grained
- multi modal
- multimedia retrieval
- coarse grained
- visual similarity
- image retrieval
- multimedia databases
- multimedia
- visual data
- perceptual information
- access control
- visual recognition
- visual content
- image database
- visual information
- content based retrieval
- information retrieval
- databases
- information retrieval systems
- metadata
- search engine
- text retrieval
- natural language
- multimedia information retrieval
- keywords