ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.

Published in: CoRR (2023)

Keyphrases