DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding.
Jeongsoo ChoiJoanna HongYong Man RoPublished in: ICCV (2023)
Keyphrases
- speech synthesis
- vision guided
- speech recognition
- prosodic features
- mobile robot navigation
- vocal tract
- text to speech
- video sequences
- mobile robot
- natural scenes
- video data
- video frames
- real time
- pattern recognition
- multimedia
- automatic speech recognition
- speaker verification
- hidden markov models
- language model
- key frames
- video surveillance
- natural images
- image analysis
- neural network