FreeTalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness.
Sicheng YangZunnan XuHaiwei XueYongkang ChengShaoli HuangMingming GongZhiyong WuPublished in: ICASSP (2024)
Keyphrases
- diffusion models
- speech recognition
- text to speech
- audio visual
- automatic speech recognition
- speaker recognition
- synthesized speech
- speaker verification
- prosodic features
- diffusion model
- hidden markov models
- english text
- speaker identification
- gesture recognition
- speaker dependent
- vocal tract
- information diffusion
- speech synthesis
- speech signal
- multimodal interfaces
- speaker diarization
- spontaneous speech
- information retrieval
- acoustic features
- hand movements
- social networks
- text mining
- influence maximization
- np hard
- image denoising