Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders.
Nicola MessinaGiuseppe AmatoAndrea EsuliFabrizio FalchiClaudio GennaroStéphane Marchand-MailletPublished in: ACM Trans. Multim. Comput. Commun. Appl. (2021)
Keyphrases
- cross modal
- fine grained
- multi modal
- multimedia retrieval
- coarse grained
- visual similarity
- image retrieval
- multimedia databases
- multimedia
- visual recognition
- access control
- visual data
- perceptual information
- multimedia information retrieval
- content based retrieval
- keywords
- visual information
- visual content
- data lineage
- video retrieval
- information retrieval
- natural language
- image annotation
- image database
- database systems
- metadata