Speech gesture generation from the trimodal context of text, audio, and speaker identity.
Youngwoo YoonBok ChaJoo-Haeng LeeMinsu JangJaeyeon LeeJaehong KimGeehyuk LeePublished in: ACM Trans. Graph. (2020)
Keyphrases
- audio visual
- speaker identification
- audio stream
- speech recognition
- text to speech
- prosodic features
- automatic transcription
- speaker recognition
- multi stream
- automatic speech recognition
- text to speech synthesis
- spoken documents
- text graphics
- multi modal
- synthesized speech
- human language
- speech processing
- speech signal
- broadcast news
- context aware
- speaker verification
- text generation
- information retrieval
- speaker diarization
- mel frequency cepstral coefficients
- audio signals
- english text
- audio features
- speech synthesis
- emotion recognition
- text recognition
- acoustic features
- visual speech
- hand movements
- text input
- speaker dependent
- visual data
- text mining
- multimedia
- natural language processing
- visual features
- sign language
- audio content