Speech-Text Based Multi-Modal Training with Bidirectional Attention for Improved Speech Recognition.
Yuhang YangHaihua XuHao HuangEng Siong ChngSheng LiPublished in: ICASSP (2023)
Keyphrases
- multi modal
- speech recognition
- isolated word
- speech signal
- speech synthesis
- acoustic models
- hidden markov models
- automatic speech recognition
- speech processing
- speech recognizer
- language model
- noisy environments
- speaker identification
- pattern recognition
- audio visual
- speech recognition technology
- high dimensional
- speech recognition systems
- image annotation
- speaker independent
- video search
- noisy speech
- discriminative training
- recognition engine
- speaker dependent
- speech retrieval
- word error rate
- image processing
- speech recognition errors
- speech recognizers
- visual features