Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech.
Dong YangTomoki KoriyamaYuki SaitoTakaaki SaekiDetai XinHiroshi SaruwatariPublished in: CoRR (2023)
Keyphrases
- language model
- text to speech
- pre trained
- prosodic features
- speech recognition
- speech synthesis
- language modeling
- n gram
- information retrieval
- retrieval model
- probabilistic model
- training data
- mixture model
- context sensitive
- training examples
- speaker verification
- test collection
- query expansion
- automatic speech recognition
- speech signal
- ad hoc information retrieval
- audio visual
- pattern recognition
- relevance model
- data sets
- multi modal
- high dimensional