Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam.
Marc DelcroixTsubasa OchiaiKaterina ZmolíkováKeisuke KinoshitaNaohiro TawaraTomohiro NakataniShoko ArakiPublished in: CoRR (2020)
Keyphrases
- speech recognition
- audio visual
- speaker recognition
- automatic speech recognition
- speaker verification
- speaker identification
- vocal tract
- prosodic features
- speech recognizer
- speech synthesis
- multi modal
- speech signal
- automatic speech recognition systems
- spoken language
- automatic extraction
- synthesized speech
- gaussian mixture model
- automatic transcription
- speaker dependent
- noisy environments
- feature selection
- speech sounds
- speaker adaptation
- information extraction
- vector quantization
- acoustic features
- visual information
- target tracking
- speaker diarization
- spontaneous speech
- recognition engine
- acoustic models
- hidden markov models
- frequency domain
- audio stream
- endpoint detection
- text to speech