Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model.
Kanzhi ChengWenpo SongZheng MaWenhao ZhuZixuan ZhuJianbing ZhangPublished in: ACM Multimedia (2023)
Keyphrases
- real world
- single image
- high level
- conceptual model
- image analysis
- image data
- formal representation
- multiscale
- statistical model
- high resolution
- computational model
- input image
- image classification
- knowledge base
- generic model
- energy function
- training set
- similarity measure
- image segmentation
- vision system
- d objects
- image features
- prior knowledge
- image content
- natural language
- prior model