Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model.

Published in: CoRR (2023)

Keyphrases