Deep Audio-visual System for Closed-set Word-level Speech Recognition.
Yougen YuanWei TangMinhao FanYue CaoPeng ZhangLei XiePublished in: ICMI (2019)
Keyphrases
- word level
- audio visual
- speech recognition
- audio visual speech recognition
- multi modal
- language independent
- document images
- n gram
- visual information
- machine translation
- language model
- hidden markov models
- multi stream
- automatic speech recognition
- word recognition
- document analysis
- character recognition
- noisy environments
- word segmentation
- speech signal
- pattern recognition
- visual data
- multimedia
- sentence level
- speaker identification
- image analysis
- visual features
- feature extraction
- machine learning
- feature space
- probabilistic model
- low level
- image processing
- image retrieval
- artificial intelligence