Login / Signup
DMMAN: A two-stage audio-visual fusion framework for sound separation and event localization.
Ruihan Hu
Songbin Zhou
Zhi-Ri Tang
Sheng Chang
Qijun Huang
Yisen Liu
Wei Han
Edmond Qi Wu
Published in:
Neural Networks (2021)
Keyphrases
</>
audio visual
fusion framework
sound source
multi modal
visual information
fusion process
visual data
combining multiple
multimedia
multi stream
temporal context
audio visual speech recognition
data sets
video content
video data
text mining
spatio temporal