FuseCap: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions.
Noam RotsteinDavid BensaïdShaked BrodyRoy GanzRon KimmelPublished in: CoRR (2023)
Keyphrases
- visual data
- language model
- visual features
- image data
- image content
- language modeling
- visual information
- visual content
- n gram
- probabilistic model
- image classification
- image features
- image retrieval
- video data
- image sequences
- image representation
- contextual information
- high dimensional data
- input image
- image database
- multimedia data
- high dimensional
- text classification
- spatial relationships
- image regions
- information retrieval
- dimensionality reduction
- keywords
- search engine