Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation.
Shih-Lun WuXuankai ChangGordon WichernJee-weon JungFrançois G. GermainJonathan Le RouxShinji WatanabePublished in: CoRR (2023)
Keyphrases