Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis.
Xueyuan ChenShun LeiZhiyong WuDong XuWeifeng ZhaoHelen MengPublished in: COLING (2022)
Keyphrases
- speech synthesis
- multiscale
- speech recognition
- unsupervised learning
- coarse to fine
- vocal tract
- text to speech
- prosodic features
- data driven
- edge detection
- supervised learning
- natural images
- learning algorithm
- local binary pattern
- agglomerative clustering
- hidden markov models
- conversational agents
- object detection
- deep structure
- semi supervised