Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition.
Suyoun KimKe LiLucas KabelaRon HuangJiedan ZhuOzlem KalinliDuc LePublished in: EMNLP (Findings) (2022)
Keyphrases
- speech recognition
- speaker identification
- speech processing
- wall street journal corpus
- speech recognition technology
- hidden markov models
- audio visual speech recognition
- isolated word
- acoustic models
- automatic speech recognition
- language model
- noisy environments
- pattern recognition
- speech recognition systems
- multimedia
- speech synthesis
- speech recognizer
- speech signal
- speaker independent
- audio visual
- broadcast news
- cepstral coefficients
- speaker recognition
- training process
- text data
- visual information
- multi modal
- information retrieval
- text to speech
- students with learning disabilities
- english text
- signal processing