Multi-Modal Pre-Training for Automated Speech Recognition.
David M. ChanShalini GhoshDebmalya ChakrabartyBjörn HoffmeisterPublished in: ICASSP (2022)
Keyphrases
- multi modal
- speech recognition
- wall street journal corpus
- isolated word
- speech processing
- hidden markov models
- language model
- speech synthesis
- automatic speech recognition
- pattern recognition
- speech signal
- multi modality
- speech recognition technology
- noisy environments
- acoustic models
- speaker identification
- high dimensional
- audio visual
- speaker dependent
- speech recognition systems
- speech recognizer
- training process
- speaker adaptation
- semantic concepts
- discriminative training
- speaker diarization
- single modality
- neural network
- video search
- audio visual speech recognition
- image annotation
- speaker independent
- information retrieval
- uni modal
- machine learning