Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment.
Kim Sung-BinArda SenocakHyunwoo HaAndrew OwensTae-Hyun OhPublished in: CVPR (2023)
Keyphrases
- visual scene
- visual information
- visual data
- visual features
- visual attention
- complex scenes
- vision system
- low level
- natural images
- spatial relations
- audio signal
- multimedia
- natural scenes
- human visual system
- eye movements
- object recognition
- audio content
- image processing
- low level features
- image collections
- saliency map
- image database
- video sequences