Parameter Efficient Audio Captioning with Faithful Guidance Using Audio-Text Shared Latent Representation.
Arvind Krishna SridharYinyi GuoErik VisserRehana MahfuzPublished in: ICASSP (2024)
Keyphrases
- text graphics
- multimedia
- human language
- visual information
- visual data
- audio visual
- text retrieval
- audio signals
- word counts
- information retrieval
- audio stream
- audio video
- audio signal
- cross modal
- emotion recognition
- digital video
- signal processing
- text mining
- semantic representation
- parameter space
- audio recordings
- generative model
- spoken documents