A practical two-stage training strategy for multi-stream end-to-end speech recognition.
Ruizhi LiGregory SellXiaofei WangShinji WatanabeHynek HermanskyPublished in: CoRR (2019)
Keyphrases
- end to end
- speech recognition
- audio visual speech recognition
- multi stream
- hidden markov models
- audio visual
- isolated word
- pattern recognition
- speech synthesis
- language model
- speech signal
- automatic speech recognition
- acoustic models
- congestion control
- speech recognizer
- discriminative training
- training set
- noisy environments
- image data
- speaker identification
- computer vision