WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research.
Xinhao MeiChutong MengHaohe LiuQiuqiang KongTom KoChengqi ZhaoMark D. PlumbleyYuexian ZouWenwu WangPublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2024)
Keyphrases
- multimedia
- audio visual
- cross modal
- signal processing
- audio signals
- multimodal fusion
- audio stream
- visual information
- multi modal
- visual data
- human language
- programming language
- text to speech
- audio files
- story segmentation
- music score
- multimedia information
- cepstral features
- neural network
- multi stream
- music retrieval
- audio signal
- benchmark datasets
- object oriented