MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization.
Adriana Fernandez-LopezHonglie ChenPingchuan MaLu YinQiao XiaoStavros PetridisShiwei LiuMaja PanticPublished in: CoRR (2024)
Keyphrases
- speech recognition
- acoustic models
- language model
- speech recognizer
- hidden markov models
- probabilistic model
- wall street journal corpus
- speech signal
- speech processing
- isolated word
- speech understanding
- speech synthesis
- training process
- audio visual
- automatic speech recognition
- multi modal
- speech recognition systems
- speech retrieval
- data mining
- keyword spotting
- speaker independent
- natural language processing
- speech recognition technology
- computer vision
- audio visual speech recognition
- information retrieval