Improving robustness of spontaneous speech synthesis with linguistic speech regularization and pseudo-filled-pause insertion.
Yuta MatsunagaTakaaki SaekiShinnosuke TakamichiHiroshi SaruwatariPublished in: SSW (2023)
Keyphrases
- speech synthesis
- speech recognition
- text to speech
- vocal tract
- prosodic features
- speech corpus
- speech signal
- natural language
- computational efficiency
- computer vision
- noisy environments
- image acquisition
- automatic speech recognition
- parameter selection
- image coding
- language model
- natural language processing
- facial expressions