Predicting Head Pose from Speech with a Conditional Variational Autoencoder.
David GreenwoodStephen D. LaycockIain MatthewsPublished in: INTERSPEECH (2017)
Keyphrases
- visual focus of attention
- head pose estimation
- audio visual
- head motion
- human head
- pose estimation
- speech recognition
- facial animation
- tracking and pose estimation
- head movements
- gaze direction
- focus of attention
- speech signal
- camera setup
- d objects
- hand movements
- image segmentation
- automatic speech recognition
- real time
- text to speech
- facial gestures
- optical flow
- head tracking
- speech synthesis
- pose variations
- position and orientation
- partial occlusion
- visual tracking
- face model
- dialogue system
- conditional probabilities
- upper body
- human faces
- object detection
- facial expressions
- computer vision
- neural network