Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR.
Zhenyang LiYangyang GuoKejie WangXiaolin ChenLiqiang NieMohan S. KankanhalliPublished in: ACM Multimedia (2023)
Keyphrases
- visual perception
- human vision
- visual field
- computer vision
- programming language
- visual information
- visual processing
- vision system
- language learning
- natural language
- visual query language
- real time
- specification language
- object oriented
- low level
- knowledge base
- data sets
- visual features
- image classification
- modeling language
- language processing
- computational linguistics
- image processing
- neural network
- formal language
- visual input
- commonsense reasoning