Automatic speech recognition improved by two-layered audio-visual integration for robot audition.
Takami YoshidaKazuhiro NakadaiHiroshi G. OkunoPublished in: Humanoids (2009)
Keyphrases
- audio visual
- automatic speech recognition
- speech recognition
- multi modal
- multi stream
- speech signal
- speech retrieval
- hidden markov models
- visual information
- conversational speech
- audio visual speech recognition
- broadcast news
- visual data
- multimedia
- noisy environments
- image processing
- emotion recognition
- passage retrieval
- audio features
- speaker verification
- multiscale