Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets.
Paul PrimusKhaled KoutiniGerhard WidmerPublished in: CoRR (2023)
Keyphrases
- data sets
- natural language
- multimedia
- multimedia information
- cross modal
- audio visual content
- audio visual
- audio video
- information retrieval
- signal processing
- multi modal
- human language
- language processing
- lifelog
- knowledge representation
- audio signal
- multimedia retrieval
- multimedia information retrieval
- audio signals
- multimedia databases
- visual information
- audio content
- image database
- visual data
- document retrieval
- spoken documents
- machine learning
- audio recordings
- database
- music information retrieval
- natural language generation
- multimedia documents
- digital video
- content based retrieval
- retrieval systems
- test collection
- language model
- information retrieval systems
- case based reasoning