MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning.

Published in: CoRR (2022)

Keyphrases