Controlling formant frequencies with neural text-to-speech for the manipulation of perceived speaker age.
Ziya KhanLovisa WihlborgCassia Valentini-BotinhaoOliver WattsPublished in: INTERSPEECH (2023)
Keyphrases
- text to speech
- prosodic features
- formant frequencies
- speech synthesis
- automatic speech recognition
- vocal tract
- network architecture
- vowel phonemes
- programming tool
- speaker verification
- speech recognition
- neural network
- speech signal
- text to speech synthesis
- speaker recognition
- word processing
- neural model
- english text
- artificial neural networks
- broadcast news
- speaker identification
- speaker diarization
- audio visual
- probabilistic model
- computer vision