Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning.
Bingchen ZhaoYongshuo ZongLetian ZhangTimothy M. HospedalesPublished in: CoRR (2024)
Keyphrases
- image understanding
- language model
- knowledge representation
- computer vision
- computational vision
- multi hop
- object recognition
- language modeling
- object detection
- n gram
- probabilistic model
- document retrieval
- statistical language models
- real time
- language modelling
- image annotation
- information retrieval
- neural network
- routing protocol
- image retrieval
- data transmission
- feature extraction
- image processing