Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval.
Paul PrimusGerhard WidmerPublished in: CoRR (2024)
Keyphrases
- multimedia information
- multimedia
- metadata
- audio content
- cross modal
- audio visual content
- semantic search
- multimedia data
- multimedia information retrieval
- information retrieval
- audio visual
- multimedia documents
- databases
- text to speech
- emotion recognition
- multimedia content
- multimedia databases
- content based retrieval
- digital video
- natural language
- semantic content
- retrieval systems
- multi modal
- digital libraries
- visual information
- manifold learning
- lifelog
- audio signals
- audio features
- music information retrieval
- relevance feedback
- image database
- language learning
- programming language
- signal processing