Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation.
Arvind Krishna SridharYinyi GuoErik VisserRehana MahfuzPublished in: CoRR (2023)
Keyphrases
- text graphics
- multimedia
- audio video
- emotion recognition
- cross media retrieval
- human language
- audio visual
- spoken documents
- audio stream
- visual information
- digital video
- multimedia information
- metadata
- database
- semantic representation
- visual data
- text retrieval
- text to speech
- multi modal
- signal processing
- information retrieval systems
- text mining
- word counts
- information retrieval