Unifying Visual and Semantic Feature Spaces with Diffusion Models for Enhanced Cross-Modal Alignment.
Yuze ZhengZixuan LiXiangxian LiJinxing LiuYuqing WangXiangxu MengLei MengPublished in: CoRR (2024)
Keyphrases
- cross modal
- diffusion models
- multi modal
- image retrieval
- feature space
- visual similarity
- semantic concepts
- diffusion model
- multimedia retrieval
- information diffusion
- high dimensional
- visual recognition
- semantic similarity
- visual data
- multimedia databases
- perceptual information
- semantic information
- image representation
- visual features
- training set
- social networks
- visual concepts
- high level
- low level features
- multimedia data
- visual content
- visual information
- automatic image annotation
- feature vectors
- object recognition
- image sequences
- viral marketing