Cross-Speaker Emotion Transfer by Manipulating Speech Style Latents.
Suhee JoYounggun LeeYookyung ShinYeongtae HwangTaesu KimPublished in: ICASSP (2023)
Keyphrases
- emotion recognition
- audio visual
- speech recognition
- speaker verification
- speaker recognition
- automatic speech recognition
- speaker identification
- prosodic features
- text to speech synthesis
- emotional speech
- speaker dependent
- speech signal
- automatic speech recognition systems
- vocal tract
- gaussian mixture model
- emotional state
- speech synthesis
- crime scene
- speaker diarization
- synthesized speech
- poor quality
- acoustic features
- emotion classification
- text to speech
- noisy environments
- facial expressions
- broadcast news
- automatic transcription
- hidden markov models
- speaker adaptation
- human computer interaction
- speech sounds
- speaker independent
- mel frequency cepstral coefficients
- feature extraction
- acoustic models
- affective states
- audio stream
- multiscale