Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation.
Anna DeichlerShivam MehtaSimon AlexandersonJonas BeskowPublished in: CoRR (2023)
Keyphrases
- text to speech
- text to speech synthesis
- audio visual
- text graphics
- human language
- spoken documents
- audio stream
- speech recognition
- word counts
- text mining
- text generation
- multi stream
- english text
- speech processing
- speech synthesis
- spontaneous speech
- multimedia
- hidden markov models
- multi lingual
- multimodal interfaces
- speaker identification
- information retrieval
- audio video
- audio features
- emotion recognition
- gesture recognition
- prosodic features
- speech music discrimination
- multimodal interaction
- keywords
- text classification
- text documents
- language generation
- text input
- lexical features
- text recognition
- broadcast news