Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders.
Jing LiDi KangWenjie PeiXuefei ZheYing ZhangZhenyu HeLinchao BaoPublished in: CoRR (2021)
Keyphrases
- audio visual
- audio stream
- audio signals
- multimedia
- hidden markov models
- gesture recognition
- cepstral features
- speaker identification
- signal processing
- hand gestures
- audio features
- broadcast news
- emotion recognition
- text to speech
- hand movements
- visual information
- audio video
- spoken words
- sign language
- linear predictive coding
- speech processing
- speech recognition
- digital audio
- automatic transcription
- visual data
- image segmentation
- neural network
- visual speech
- digital video
- human robot interaction
- spontaneous speech
- audio files
- acoustic signals
- optical flow
- multi modal
- bayesian networks
- audio recordings
- prosodic features
- multiscale