Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis.
Kenichi FujitaAtsushi AndoYusuke IjimaPublished in: CoRR (2024)
Keyphrases
- speech synthesis
- speech recognition
- prosodic features
- vocal tract
- text to speech
- automatic speech recognition
- speaker dependent
- speaker identification
- hidden markov models
- speech signal
- speaker diarization
- speech recognizer
- speech corpus
- language model
- speaker recognition
- acoustic models
- speech sounds
- noisy environments
- information extraction
- speech recognition systems
- speaker independent
- speaker verification
- manifold learning
- speech enhancement
- low dimensional
- speaker adaptation
- computer vision
- machine learning
- phoneme recognition
- data mining