Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition.
Suyoun KimKe LiLucas KabelaRongqing HuangJiedan ZhuOzlem KalinliDuc LePublished in: CoRR (2022)
Keyphrases
- speech recognition
- speaker identification
- wall street journal corpus
- speech processing
- speech recognition technology
- isolated word
- audio visual speech recognition
- acoustic models
- hidden markov models
- automatic speech recognition
- speech synthesis
- speech signal
- speech recognizer
- language model
- pattern recognition
- information retrieval
- broadcast news
- noisy environments
- speaker recognition
- speaker independent
- handwriting recognition
- cepstral coefficients
- signal processing
- multimedia
- speech recognition systems
- text data
- english text
- discriminative training
- maximum likelihood
- document analysis
- acoustic features
- feature space
- visual data
- speech recognizers
- feature selection
- computer vision
- neural network