Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition.
Yash JainDavid M. ChanPranav DheramAparna KhareOlabanji ShonibareVenkatesh RavichandranShalini GhoshPublished in: LREC/COLING (2024)
Keyphrases
- multi modal
- multistage
- automatic speech recognition
- speech recognition
- hidden markov models
- conversational speech
- word error rate
- speech signal
- dynamic programming
- lot sizing
- broadcast news
- multi modality
- audio visual
- single stage
- speech retrieval
- semantic concepts
- high dimensional
- cross modal
- optimal policy
- speech corpus
- video search
- language model
- multimedia
- speech sounds
- image annotation
- sufficient conditions
- spontaneous speech
- image classification
- single modality