Asynchronous integration of audio and visual sources in bi-modal automatic speech recognition.
Paul DelégliseAlexandrina RogozanMamoun AlissaliPublished in: EUSIPCO (1996)
Keyphrases
- automatic speech recognition
- broadcast news
- speech recognition
- visual information
- acoustic features
- speech signal
- visual data
- word error rate
- hidden markov models
- conversational speech
- visual features
- business intelligence
- speaker identification
- spoken words
- spontaneous speech
- speech retrieval
- speech corpus
- noisy environments
- word recognition
- data sources
- video search
- content based video retrieval
- recognition errors
- multimedia
- image processing
- visual content
- multi modal
- speech sounds
- text to speech
- speaker diarization
- noisy images
- probabilistic model
- speaker adaptation
- low level