Sign in

Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR.

Zhenyang LiYangyang GuoKejie WangXiaolin ChenLiqiang NieMohan S. Kankanhalli
Published in: ACM Multimedia (2023)
Keyphrases