Let There Be Sound: Reconstructing High Quality Speech from Silent Videos.
Ji-Hoon KimJaehun KimJoon Son ChungPublished in: AAAI (2024)
Keyphrases
- high quality
- automatic speech recognition systems
- video sequences
- speech recognition
- video content
- speech signal
- video frames
- higher quality
- low quality
- ground truth
- high resolution
- spoken language
- human activities
- image quality
- video database
- video data
- speech synthesis
- audio visual
- content based video retrieval
- acoustic features
- automatic speech recognition
- text to speech
- audio features
- audio signal
- speaker identification
- multimodal interfaces
- recognition engine
- speaker recognition
- endpoint detection
- sound source
- video dataset
- event recognition
- video analysis
- video clips
- video surveillance
- depth map
- user generated
- low level
- moving camera
- event detection