Multimodal Speech Recognition with Unstructured Audio Masking.
Tejas SrinivasanRamon SanabriaFlorian MetzeDesmond ElliottPublished in: CoRR (2020)
Keyphrases
- speech recognition
- speaker identification
- audio visual
- audio visual speech recognition
- speech processing
- multi stream
- speech recognition technology
- multimedia
- hidden markov models
- multi modal
- pattern recognition
- automatic speech recognition
- visual speech
- cepstral coefficients
- speech understanding
- visual information
- speaker independent
- language model
- speech recognizer
- speech signal
- speech synthesis
- speech recognition systems
- audio signals
- noisy environments
- mel frequency cepstral coefficients
- speech recognizers
- signal processing
- speaker recognition
- speaker dependent
- keyword spotting
- digit recognition
- visual data
- emotion recognition
- broadcast news
- speech retrieval
- face recognition
- audio signal
- isolated word
- computer vision