Login / Signup
Unsupervised Audio-Caption Aligning Learns Correspondences Between Individual Sound Events and Textual Phrases.
Huang Xie
Okko Räsänen
Konstantinos Drossos
Tuomas Virtanen
Published in:
ICASSP (2022)
Keyphrases
</>
multimedia
natural language
keywords
event detection
soccer video
audio signal
point correspondences
event sequences
unsupervised learning
audio content
temporal information
semantic information
machine learning
semi supervised
image features
information extraction
metadata