Multi-Modal Pre-Training for Automated Speech Recognition.
David M. ChanShalini GhoshDebmalya ChakrabartyBjörn HoffmeisterPublished in: CoRR (2021)
Keyphrases
- multi modal
- speech recognition
- wall street journal corpus
- isolated word
- hidden markov models
- automatic speech recognition
- acoustic models
- language model
- speech recognizer
- speech synthesis
- pattern recognition
- speech signal
- audio visual
- speech recognition technology
- multi modality
- speech processing
- noisy environments
- speech recognition systems
- video search
- audio visual speech recognition
- training set
- speaker independent
- image annotation
- speaker dependent
- high dimensional
- semantic concepts
- training process
- speaker identification
- information retrieval