High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units.
Junchen LuBerrak SismanMingyang ZhangHaizhou LiPublished in: CoRR (2023)
Keyphrases
- high quality
- text to speech
- highly accurate
- speech recognition
- low quality
- speech synthesis
- semi automatic
- computationally efficient
- speech recognition errors
- emotion recognition
- image quality
- ground truth
- fully automatic
- speech quality
- higher quality
- synthesized speech
- fundamental frequency
- high resolution
- audio visual
- neural network
- hidden markov models
- depth map
- speech signal
- active learning
- spectral features
- discrete geometry
- speech sounds
- question answering