Synthesizing expressive speech from amateur audiobook recordings.
Éva SzékelyTamás Gábor CsapóBálint TóthPéter MihajlikJulie Carson-BerndsenPublished in: SLT (2012)
Keyphrases
- spontaneous speech
- audio visual
- acoustic features
- audio recordings
- speech recognition
- speech signal
- spoken language
- speech synthesis
- multi modal
- human machine interaction
- digital camera
- automatic speech recognition
- endpoint detection
- text to speech
- audio features
- multi stream
- dialogue system
- text to speech synthesis
- noisy environments
- speech processing
- visual information
- vocal tract
- visual features
- hidden markov models
- audio stream