Speech Rhythm-Based Speaker Embeddings Extraction from Phonemes and Phoneme Duration for Multi-Speaker Speech Synthesis.
Kenichi FujitaAtsushi AndoYusuke IjimaPublished in: IEICE Trans. Inf. Syst. (2024)
Keyphrases
- speech synthesis
- speech recognition
- prosodic features
- vocal tract
- text to speech
- automatic speech recognition
- speaker dependent
- speech signal
- speaker identification
- hidden markov models
- speech corpus
- language model
- speech recognition systems
- speech recognizer
- pattern recognition
- speech sounds
- speaker diarization
- speech enhancement
- speaker independent
- acoustic models
- speaker recognition
- information extraction
- computer vision
- neural network
- speaker adaptation
- automatic speech recognition systems
- noisy environments
- audio visual
- low dimensional
- dimensionality reduction
- image processing
- machine learning