Login / Signup
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning.
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
Published in:
CoRR (2024)
Keyphrases
</>
text graphics
multimedia
audio video
signal processing
human language
database
audio visual
audio stream
visual information
digital video
text to speech
audio signals
audio content
computational complexity
multimedia information