On Training Targets and Objective Functions for Deep-learning-based Audio-visual Speech Enhancement.
Daniel MichelsantiZheng-Hua TanSigurdur SigurdssonJesper JensenPublished in: ICASSP (2019)
Keyphrases
- audio visual
- deep learning
- multi modal
- speech enhancement
- visual information
- unsupervised learning
- noisy environments
- machine learning
- visual data
- noise reduction
- multimedia
- supervised learning
- signal to noise ratio
- speech signal
- mental models
- training set
- higher order
- high level
- principal component analysis
- maximum likelihood
- feature vectors
- computer vision
- information retrieval