Minimum latency training of sequence transducers for streaming end-to-end speech recognition.
Yusuke ShinoharaShinji WatanabePublished in: INTERSPEECH (2022)
Keyphrases
- end to end
- speech recognition
- wall street journal corpus
- scalable video
- isolated word
- hidden markov models
- automatic speech recognition
- rate adaptation
- acoustic models
- speech processing
- pattern recognition
- speech synthesis
- language model
- speech signal
- congestion control
- noisy environments
- speech recognizer
- speech recognition systems
- speaker independent
- speech recognition technology
- stream processing
- transport layer
- training process
- content delivery
- discriminative training
- speaker identification
- speaker dependent