Language-based Audio Retrieval with GPT-Augmented Captions and Self-Attended Audio Clips.
Fuyu GuYang GuYiyan XuHaoran SunYushan PanShengchen LiHaiyang ZhangPublished in: CSCWD (2024)
Keyphrases
- content based music retrieval
- cross modal
- multimedia information
- multimedia
- audio visual content
- human language
- audio signals
- audio visual
- signal processing
- visual information
- multi modal
- multimedia databases
- visual data
- audio content
- lifelog
- text to speech
- audio video
- relevance feedback
- digital video
- language learning
- multimedia data
- visual features
- information retrieval
- programming language
- xml documents
- semantic search
- content based retrieval
- text retrieval
- information retrieval systems