Fusion Architectures for Word-Based Audiovisual Speech Recognition.
Michael WandJürgen SchmidhuberPublished in: INTERSPEECH (2020)
Keyphrases
- speech recognition
- speech recognition systems
- speech recognizer
- wall street journal corpus
- keyword spotting
- speech recognizers
- word error rate
- hidden markov models
- language model
- automatic speech recognition
- speech processing
- speech synthesis
- handwriting recognition
- speech recognition technology
- pattern recognition
- speaker identification
- n gram
- speech signal
- noisy environments
- video retrieval
- visual information
- word sense disambiguation
- word segmentation
- speaker diarization
- multi modal
- maximum likelihood
- mobile devices