Zero-Shot Audio Captioning Using Soft and Hard Prompts.
Yiming ZhangXuenan XuRuoyi DuHaohe LiuYuan DongZheng-Hua TanWenwu WangZhanyu MaPublished in: CoRR (2024)
Keyphrases
- multimedia
- audio stream
- image processing
- hard constraints
- cross modal
- signal processing
- visual information
- emotion recognition
- high level
- cepstral features
- real time
- class probability estimation
- music scores
- audio recordings
- audio visual
- control group
- learning process
- lower bound
- information systems
- information retrieval
- machine learning
- databases
- data sets