Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation.
Lingting ZhuXian LiuXuanyu LiuRui QianZiwei LiuLequan YuPublished in: CVPR (2023)
Keyphrases
- diffusion models
- audio stream
- audio visual
- multi stream
- broadcast news
- diffusion model
- emotion recognition
- information diffusion
- multimodal interfaces
- audio signals
- speaker identification
- hidden markov models
- speech recognition
- text to speech
- audio features
- social networks
- hand movements
- speech signal
- speech music discrimination
- automatic transcription
- automatic speech recognition
- gesture recognition
- influence maximization
- upper bound
- diffusion process
- steady state
- human computer interaction