Speech-to-lip movement synthesis based on the EM algorithm using audio-visual HMMs.
Eli YamamotoSatoshi NakamuraKiyohiro ShikanoPublished in: ICSLP (1998)
Keyphrases
- em algorithm
- audio visual
- multi stream
- audio visual speech recognition
- expectation maximization
- multi modal
- hidden markov models
- mixture model
- maximum likelihood
- generative model
- visual information
- visual data
- gaussian mixture model
- emotion recognition
- temporal context
- speech recognition
- probabilistic model
- multimedia
- visual features
- machine learning
- semantic information
- feature extraction
- visual speech