Phoneme-to-Audio Alignment with Recurrent Neural Networks for Speaking and Singing Voice.
Yann TeytautAxel RoebelPublished in: Interspeech (2021)
Keyphrases
- recurrent neural networks
- prosodic features
- speech synthesis
- text to speech
- audio features
- music information retrieval
- emotion recognition
- speech recognition
- feed forward
- neural network
- acoustic features
- reservoir computing
- complex valued
- audio visual
- feedforward neural networks
- mel frequency cepstral coefficients
- speaker verification
- artificial neural networks
- recurrent networks
- speech sounds
- visual speech
- automatic speech recognition
- multimedia
- echo state networks
- cascade correlation
- speaker identification
- hidden markov models
- music retrieval
- long short term memory
- context dependent
- visual information
- broadcast news
- nonlinear dynamic systems
- long term
- voice activity detection
- spontaneous speech
- vocal tract
- chaotic time series
- language model
- learning algorithm