Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading.
Minsu KimJeong Hun YeoYong Man RoPublished in: CoRR (2022)
Keyphrases
- lip reading
- visual speech
- head tracking
- visual information
- hidden markov models
- speaker identification
- visual data
- audio signals
- multimedia
- speech recognition
- noisy environments
- head motion
- video camera
- video signals
- expression recognition
- visual features
- real time
- acoustic features
- low level
- active appearance models
- face recognition