CLIP-based image captioning via unsupervised cycle-consistency in the latent space.

Romain Bielawski Rufin VanRullen

Published in: RepL4NLP@ACL (2023)

Keyphrases

input image
image segmentation
image retrieval
image classification
image representation
latent space
image features
bayesian framework
machine learning
similarity measure
object recognition
dimensionality reduction
graphical models
low dimensional
lower dimensional