Phoneme recognition using visual features on speech spectrograms.
Shigeru KatagiriManami YokotaPublished in: ECST (1987)
Keyphrases
- visual features
- speech recognition
- speech signal
- speech synthesis
- automatic speech recognition
- automatic speech recognition systems
- visual information
- visual content
- speaker dependent
- image classification
- speech sounds
- acoustic features
- image search
- content based video retrieval
- audio features
- keywords
- image annotation
- speaker identification
- image retrieval
- hidden markov models
- visual appearance
- low level
- text to speech
- noisy environments
- visual speech
- low level features
- image collections
- audio visual
- visual patterns
- visual data
- language model
- semantic gap
- semantic concepts
- bag of features
- video shots
- vowel phonemes
- bridge the semantic gap
- visual properties
- global features
- semantic features
- key frames