Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment.
Yosuke IzumiKenta NishikiShinji WatanabeTakuya NishimotoNobutaka OnoShigeki SagayamaPublished in: INTERSPEECH (2009)
Keyphrases
- speech signal
- speech recognition
- real time
- computer vision
- noisy environments
- dynamic environments
- text input
- speech synthesis
- image pairs
- autonomous agents
- three dimensional
- input data
- depth map
- mobile robot
- multi agent
- human visual system
- high quality
- stereo images
- feature extraction
- automatic speech recognition
- spoken language
- signal analysis
- information retrieval
- neural network