Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition.
Yash JainDavid ChanPranav DheramAparna KhareOlabanji ShonibareVenkatesh RavichandranShalini GhoshPublished in: CoRR (2024)
Keyphrases
- multi modal
- multistage
- automatic speech recognition
- speech recognition
- single stage
- broadcast news
- speech signal
- conversational speech
- hidden markov models
- dynamic programming
- multi modality
- word error rate
- cross modal
- high dimensional
- video search
- semantic concepts
- audio visual
- speech corpus
- uni modal
- lot sizing
- image annotation
- neural network
- speech sounds