CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis.
Chen ChenDong WangThomas Fang ZhengPublished in: ICASSP (2023)
Keyphrases
- audio visual
- speech synthesis
- speech recognition
- visual information
- prosodic features
- visual data
- speaker verification
- speaker independent
- emotion recognition
- multi modal
- audio visual speech recognition
- visual features
- vocal tract
- hidden markov models
- language model
- speech signal
- multi stream
- visual content
- multimedia
- automatic speech recognition
- text to speech
- pattern recognition
- noisy environments
- low level
- human actions
- speaker identification
- image classification
- audio features
- semantic information
- image collections
- data management
- image data
- bayesian networks