VCVTS: Multi-Speaker Video-to-Speech Synthesis Via Cross-Modal Knowledge Transfer from Voice Conversion.
Disong WangShan YangDan SuXunying LiuDong YuHelen MengPublished in: ICASSP (2022)
Keyphrases
- speech synthesis
- knowledge transfer
- cross modal
- prosodic features
- speech recognition
- text to speech
- vocal tract
- multi modal
- visual data
- multimedia
- knowledge sharing
- multimedia retrieval
- video data
- transfer learning
- video sequences
- image retrieval
- automatic speech recognition
- video content
- multimedia data
- semantic concepts
- multimedia databases
- visual recognition
- learning tasks
- video frames
- hidden markov models
- speaker verification
- machine learning
- speech signal
- video streams
- video analysis
- visual similarity
- active learning
- data analysis
- reinforcement learning
- decision trees
- image processing