Login / Signup
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer.
Yaoting Wang
Weisong Liu
Guangyao Li
Jian Ding
Di Hu
Xi Li
Published in:
AAAI (2024)
Keyphrases
</>
audio visual
multi modal
visual information
visual data
image segmentation
temporal context
sound source
audio visual speech recognition
multi stream
emotion recognition
person authentication
multimedia
multiscale
video summarization
domain knowledge
high level
audio features
data sets