From 3-d speaker cloning to text-to-audiovisual-speech.
Sascha FagelFrédéric EliseiGérard BaillyPublished in: INTERSPEECH (2008)
Keyphrases
- audio visual
- speech recognition
- speaker recognition
- synthesized speech
- multi modal
- automatic speech recognition
- speaker verification
- text to speech synthesis
- emotion recognition
- prosodic features
- text to speech
- speaker identification
- english text
- database
- information retrieval
- language generation
- speaker diarization
- speech signal
- text documents
- audio features
- text input
- vocal tract
- speech synthesis
- multi lingual
- visual information
- text mining
- text retrieval
- text recognition
- spontaneous speech
- conversational speech
- visual data
- hidden markov models
- lexical features
- broadcast news
- speaker adaptation
- speaker dependent
- video search
- text data