Beyond Image-Text Matching: Verb Understanding in Multimodal Transformers Using Guided Masking.

Published in: CoRR (2024)

Keyphrases