Audio-Visual Speech Enhancement and Separation by Utilizing Multi-Modal Self-Supervised Embeddings.
I-Chun ChernKuo-Hsuan HungYi-Ting ChenTassadaq HussainMandar GogateAmir HussainYu TsaoJen-Cheng HouPublished in: ICASSP Workshops (2023)
Keyphrases
- audio visual
- multi modal
- speech enhancement
- single channel
- sound source
- noisy environments
- noise reduction
- speech signal
- signal to noise ratio
- linear prediction
- low dimensional
- high dimensional
- audio features
- dimensionality reduction
- speech recognition
- computer vision
- image data
- wiener filter
- object recognition
- multimedia