Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments.
Ahsan AdeelMandar GogateAmir HussainPublished in: Inf. Fusion (2020)
Keyphrases
- deep learning
- audio visual
- speech enhancement
- noisy environments
- noise reduction
- single channel
- multi modal
- signal to noise ratio
- speech signal
- contextual information
- unsupervised learning
- sound source
- machine learning
- visual data
- visual information
- multimedia
- mental models
- speech recognition
- image processing
- prior knowledge
- three dimensional