Let There Be Sound: Reconstructing High Quality Speech from Silent Videos.
Ji-Hoon KimJaehun KimJoon Son ChungPublished in: CoRR (2023)
Keyphrases
- high quality
- speech recognition
- low quality
- automatic speech recognition systems
- audio features
- recognition engine
- video sequences
- acoustic features
- ground truth
- text to speech
- higher quality
- speech signal
- audio visual
- spoken language
- image quality
- video database
- automatic speech recognition
- video surveillance
- video content
- video frames
- endpoint detection
- high resolution
- sound source
- discrete tomography
- speech synthesis
- audio signal
- key frames
- noisy environments
- multi modal
- broadcast news
- temporal coherence
- speaker verification
- event recognition
- emotion recognition
- video segments
- content based video retrieval
- visual data
- video clips