Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment.

Published in: CoRR (2024)

Keyphrases