Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction.
Koki MaedaShuhei KuritaTaiki MiyanishiNaoaki OkazakiPublished in: CoRR (2024)
Keyphrases
- evaluation method
- visual context
- evaluation methods
- visual scene
- evaluation model
- temporal context
- computer vision
- object detection
- vision system
- semantic context
- visual features
- scene interpretation
- information extraction
- real time
- high level
- image processing
- fuzzy comprehensive evaluation method
- listed companies
- eye movements
- multi modal
- low level
- multiscale