Login / Signup
Muse: Multi-modal target speaker extraction with visual cues.
Zexu Pan
Ruijie Tao
Chenglin Xu
Haizhou Li
Published in:
CoRR (2020)
Keyphrases
</>
multi modal
visual cues
audio visual
low level
visual information
image annotation
information extraction
multiple visual cues
semantic concepts
multiple modalities
multi modality
high level
high dimensional
cross modal
video search
fusing multiple