GRAVO: Learning to Generate Relevant Audio from Visual Features with Noisy Online Videos.
Youngdo AhnChengyi WangYu WuJong Won ShinShujie LiuPublished in: INTERSPEECH (2023)
Keyphrases
- visual features
- visual information
- visual data
- multimedia
- audio features
- face recognition
- image retrieval
- image search
- low level
- image classification
- motion features
- visual appearance
- visual content
- low level features
- content based video retrieval
- semantic concepts
- audio visual
- image collections
- machine learning
- video data
- active learning
- image processing