Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis.

Yutian Wang Yuankun Xie Kun Zhao Hui Wang Qin Zhang

Published in: ICME (2022)

Keyphrases

speech synthesis
speech recognition
text to speech
prosodic features
vocal tract
machine learning
data driven
unsupervised learning
speech corpus
neural network
feature representation
image representation
semi supervised
image retrieval
supervised learning
supervised classification
representation scheme
information bottleneck
video sequences
computer vision
genetic algorithm