WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.
Xinhao MeiChutong MengHaohe LiuQiuqiang KongTom KoChengqi ZhaoMark D. PlumbleyYuexian ZouWenwu WangPublished in: CoRR (2023)
Keyphrases
- audio visual
- multimedia
- cross modal
- multimodal fusion
- human language
- visual information
- multimodal information
- audio features
- signal processing
- audio recordings
- multi modal
- language learning
- natural language
- multi stream
- audio video
- database
- visual data
- high level
- multimedia information
- audio stream
- text to speech
- music retrieval
- audio signals
- cepstral features
- language processing
- training dataset
- relevance feedback
- low level
- video sequences