Multimodal SpeakerBeam: Single Channel Target Speech Extraction with Audio-Visual Speaker Clues.
Tsubasa OchiaiMarc DelcroixKeisuke KinoshitaAtsunori OgawaTomohiro NakataniPublished in: INTERSPEECH (2019)
Keyphrases
- audio visual
- single channel
- sound source
- multi modal
- multi channel
- speech enhancement
- multi stream
- speaker verification
- visual information
- emotion recognition
- multimodal fusion
- independent component analysis
- frequency domain
- visual data
- multimedia
- wavelet decomposition
- audio features
- prior information
- wiener filter
- hidden markov models
- metadata
- feature vectors
- image processing