MAMO: Fine-Grained Vision-Language Representations Learning with Masked Multimodal Modeling.

Published in: SIGIR (2023)

Keyphrases