High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units.
Junchen LuBerrak SismanMingyang ZhangHaizhou LiPublished in: INTERSPEECH (2023)
Keyphrases
- high quality
- text to speech
- highly accurate
- speech synthesis
- emotion recognition
- fundamental frequency
- image quality
- speech recognition
- semi automatic
- speech sounds
- speech recognition errors
- speech signal
- low quality
- voice activity detection
- continuous domains
- fully automatic
- ground truth
- higher quality
- depth map
- spoken language
- speaker identification
- high accuracy
- speech quality
- active learning
- high resolution
- finite number
- automatic speech recognition
- noisy environments
- discrete space
- computationally efficient
- super resolution