Phoneme Duration Modeling Using Speech Rhythm-Based Speaker Embeddings for Multi-Speaker Speech Synthesis.
Kenichi FujitaAtsushi AndoYusuke IjimaPublished in: Interspeech (2021)
Keyphrases
- speech synthesis
- speech recognition
- prosodic features
- vocal tract
- text to speech
- automatic speech recognition
- speaker dependent
- speech signal
- speaker identification
- pattern recognition
- speech corpus
- hidden markov models
- speech recognizer
- speech sounds
- speaker independent
- language model
- speaker recognition
- speaker diarization
- speaker adaptation
- speech recognition systems
- acoustic models
- automatic speech recognition systems
- noisy environments
- low dimensional
- image processing
- phoneme recognition
- feature selection
- neural network