Speech-text based multi-modal training with bidirectional attention for improved speech recognition.
Yuhang YangHaihua XuHao HuangEng Siong ChngSheng LiPublished in: CoRR (2022)
Keyphrases
- multi modal
- speech recognition
- isolated word
- acoustic models
- speech recognizer
- speech signal
- speech synthesis
- hidden markov models
- automatic speech recognition
- speech processing
- language model
- speech recognition technology
- audio visual
- noisy environments
- speaker identification
- pattern recognition
- recognition engine
- speech recognition systems
- high dimensional
- word error rate
- multimedia
- speech recognizers
- keyword spotting
- speech recognition errors
- noisy speech
- speaker independent
- spoken language
- discriminative training
- computer vision
- image search