Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis.
Chunyu QiangPeng YangHao CheXiaorui WangZhongyuan WangPublished in: ISCSLP (2022)
Keyphrases
- feature extraction
- speech synthesis
- speech recognition
- prosodic features
- speaker identification
- vocal tract
- preprocessing
- face recognition
- automatic speech recognition
- speaker recognition
- audio visual
- image processing
- speaker verification
- text to speech
- data sets
- hidden markov models
- speech signal
- speaker diarization
- pairwise
- video sequences
- speaker dependent
- multi modal
- probabilistic model
- active learning
- multimedia
- machine learning
- neural network