Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia ParcalabescuAnette FrankPublished in: CoRR (2024)
Keyphrases
- image data
- input image
- text information
- image collections
- image classification
- image retrieval
- image features
- image database
- keywords
- vision system
- ground truth
- image registration
- language generation
- image understanding
- computer vision
- three dimensional
- object recognition
- text detection
- text retrieval
- similarity measure
- image annotation
- textual information
- web images
- real time
- image analysis
- historical documents
- scanned images
- image regions
- feature points
- image compression
- natural language
- programming language