EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Masked Audio Gesture Modeling.
Haiyang LiuZihao ZhuGiorgio BecheriniYichen PengMingyang SuYou ZhouNaoya IwamotoBo ZhengMichael J. BlackPublished in: CoRR (2024)
Keyphrases
- multimodal interfaces
- gesture recognition
- hand movements
- multi stream
- audio visual
- hidden markov models
- sign language
- multimedia
- hand gestures
- audio stream
- speech recognition
- audio signals
- emotion recognition
- speaker identification
- speech processing
- automatic transcription
- human computer interaction
- speech signal
- acoustic signals
- audio video
- cepstral features
- recognition engine
- linear predictive coding
- gaze control
- american sign language
- broadcast news
- audio features
- learning mechanism
- visual data
- multi modal
- language model