Multimodal speech recognition: increasing accuracy using high speed video data.
Denis IvankoAlexey KarpovDmitrii FedotovIrina S. KipyatkovaDmitry RyuminDmitriy IvankoWolfgang MinkerMilos ZeleznýPublished in: J. Multimodal User Interfaces (2018)
Keyphrases
- image processing
- speech recognition
- video data
- high speed
- pattern recognition
- multimodal information
- speech recognizers
- video streams
- video analysis
- multimedia
- video sequences
- hidden markov models
- video frames
- speech synthesis
- speech recognizer
- language model
- video content
- automatic speech recognition
- speech signal
- noisy environments
- video retrieval
- visual data
- video clips
- speaker identification
- motion vectors
- speech recognition systems
- video database
- video shots
- temporal structure
- speech recognition technology
- computer vision
- multimedia systems
- neural network
- multi modal
- key frames
- speaker independent
- feature set