Audio-visual speech recognition using depth information from the Kinect in noisy video conditions.
Georgios GalatasGerasimos PotamianosFillia MakedonPublished in: PETRA (2012)
Keyphrases
- audio visual
- depth information
- visual data
- depth map
- multimedia
- audio features
- multi modal
- depth cameras
- depth images
- stereo vision
- visual information
- depth data
- microsoft kinect
- depth image based rendering
- multi stream
- video data
- kinect sensor
- rgb d camera
- video sequences
- video frames
- rgbd images
- video content
- video streams
- stereo matching
- audio visual speech recognition
- video retrieval
- human computer interaction
- image sequences
- real time
- time of flight
- noisy environments
- visual features
- multi view
- low level
- spatio temporal
- training data