Understanding Cross-modal Interactions in V&L Models that Generate Scene Descriptions.

Michele Cafagna Kees van Deemter Albert Gatt

Published in: CoRR (2022)

Keyphrases

cross modal
information retrieval
video sequences
multi modal
visual data
feature selection
three dimensional