Two-layered audio-visual integration in voice activity detection and automatic speech recognition for robots.
Takami YoshidaKazuhiro NakadaiPublished in: INTERSPEECH (2010)
Keyphrases
- audio visual
- automatic speech recognition
- voice activity detection
- noisy environments
- speech recognition
- speaker verification
- multi modal
- speech signal
- visual information
- multi stream
- broadcast news
- conversational speech
- hidden markov models
- emotion recognition
- acoustic features
- audio features
- noise reduction
- multimedia
- speaker identification
- passage retrieval
- image processing
- visual data
- non stationary
- visual features
- edge detection
- pattern recognition
- computer vision