Time-Domain Target-Speaker Speech Separation with Waveform-Based Speaker Embedding.
Jianshu ZhaoShengzhou GaoTakahiro ShinozakiPublished in: INTERSPEECH (2020)
Keyphrases
- speech recognition
- speaker recognition
- audio visual
- speaker verification
- automatic speech recognition
- speaker identification
- speaker diarization
- speaker dependent
- prosodic features
- synthesized speech
- frequency domain
- vocal tract
- multi modal
- speaker adaptation
- speech sounds
- automatic transcription
- automatic speech recognition systems
- noisy environments
- gaussian mixture model
- vector space
- pattern recognition
- speech synthesis
- speech signal
- probabilistic neural network
- vector quantization
- cross correlation
- speech recognizer
- mel frequency cepstral coefficients
- acoustic models
- visual information
- emotion recognition
- digital images
- hidden markov models
- spoken language