Enhancing Audio Generation Diversity with Visual Information.
Zeyu XieBaihan LiXuenan XuMengyue WuKai YuPublished in: CoRR (2024)
Keyphrases
- visual information
- visual features
- visual data
- audio visual
- low level
- visual content
- content based image retrieval systems
- visual cues
- human visual system
- eye movements
- visual information retrieval
- textual information
- content based image
- multi modal
- low level features
- visual input
- image sequences
- prior knowledge
- high dimensional