PuzzleVQA: Diagnosing Multimodal Reasoning Challenges of Language Models with Abstract Visual Patterns.
Yew Ken ChiaVernon TohDeepanway GhosalLidong BingSoujanya PoriaPublished in: ACL (Findings) (2024)
Keyphrases
- language model
- visual patterns
- probabilistic model
- information retrieval
- generative model
- n gram
- multi modal
- test collection
- natural images
- visual features
- natural scenes
- image features
- higher level
- information extraction
- pattern discovery
- low level
- statistical modeling
- text classification
- texture synthesis
- feature extraction