Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation.
Lingting ZhuXian LiuXuanyu LiuRui QianZiwei LiuLequan YuPublished in: CoRR (2023)
Keyphrases
- diffusion models
- audio visual
- audio stream
- multi stream
- broadcast news
- diffusion model
- information diffusion
- multimodal interfaces
- audio signals
- speaker identification
- emotion recognition
- hidden markov models
- speech recognition
- text to speech
- gesture recognition
- hand movements
- audio features
- speech music discrimination
- visual information
- automatic transcription
- automatic speech recognition
- speech signal
- information flow
- influence maximization
- greedy algorithm
- image enhancement
- natural images
- social networks