MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for speech recognition.
Xiaohuan ZhouJiaming WangZeyu CuiShiliang ZhangZhijie YanJingren ZhouChang ZhouPublished in: INTERSPEECH (2023)
Keyphrases
- multi modal
- speech recognition
- multi task
- learning tasks
- language model
- hidden markov models
- pattern recognition
- automatic speech recognition
- speech signal
- multi class
- supervised learning
- feature selection
- audio visual
- transfer learning
- speaker identification
- multi modality
- learning problems
- video search
- training set
- bayesian networks
- bit rate
- motion estimation
- probabilistic model
- high dimensional
- maximum likelihood