Text-to-Audio Grounding: Building Correspondence Between Captions and Sound Events.
Xuenan XuHeinrich DinkelMengyue WuKai YuPublished in: CoRR (2021)
Keyphrases
- audio content
- text graphics
- temporal information
- multimedia
- news stories
- news video
- visual information
- audio visual
- audio signal
- event detection
- text to speech
- visual features
- cross media retrieval
- point correspondences
- semantic context
- human language
- text mining
- metadata
- music information retrieval
- text retrieval
- text documents
- database
- visual data
- text data
- multimedia content
- video content
- web documents
- signal processing
- keywords