MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model.

Published in: CVPR (2023)

Keyphrases