Audio to Deep Visual: Speaking Mouth Generation Based on 3D Sparse Landmarks.
Hui FangDongdong WengZeyu TianZhen SongPublished in: VRW (2023)
Keyphrases
- visual speech
- visual information
- landmark recognition
- visual data
- cross modal
- hidden markov models
- multimedia
- visual features
- visual cues
- speaker identification
- computer vision
- audio visual
- sparse data
- compressive sensing
- sparse representation
- low level
- semantic context
- high dimensional
- gaussian mixture model
- visual perception
- audio signals
- audio video
- face recognition
- visual landmarks