Audio-visual deep learning for noise robust speech recognition.
Jing HuangBrian KingsburyPublished in: ICASSP (2013)
Keyphrases
- speech recognition
- audio visual
- deep learning
- noisy environments
- audio visual speech recognition
- multi stream
- hidden markov models
- multi modal
- unsupervised learning
- language model
- speech signal
- speaker identification
- automatic speech recognition
- visual information
- machine learning
- pattern recognition
- noise reduction
- multimedia
- visual data
- audio features
- image sequences
- information retrieval
- generative model
- co occurrence
- high dimensional