Login / Signup
Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer.
Yaoting Wang
Weisong Liu
Guangyao Li
Jian Ding
Di Hu
Xi Li
Published in:
CoRR (2023)
Keyphrases
</>
audio visual
multi modal
sound source
visual information
multi stream
multimedia
visual data
audio visual speech recognition
video summarization
image segmentation
emotion recognition
person authentication
temporal context
multiscale
high level
visual content