Login / Signup
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning.
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
Published in:
ICASSP (2024)
Keyphrases
</>
multimedia
text graphics
visual information
audio visual
visual data
audio signals
signal processing
database
audio stream
audio content
vector space
audio video
cross modal
audio recordings
human language
audio signal
music information retrieval
keywords
metadata
information retrieval