Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis.
Karolos NikitarasKonstantinos KlapsasNikolaos EllinasGeorgia ManiatiJune Sig SungInchul HwangSpyros RaptisAimilios ChalamandarisPirros TsiakoulisPublished in: CoRR (2022)