Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation.
Shih-Lun WuXuankai ChangGordon WichernJee-Weon JungFrançois G. GermainJonathan Le RouxShinji WatanabePublished in: ICASSP (2024)