DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer.

Keon Lee Dong Won Kim Jaehyeon Kim Jaewoong Cho

Published in: CoRR (2024)

Keyphrases

text to speech
speech synthesis
text to speech synthesis
programming tool
prosodic features
fuzzy logic
highly scalable
word processing
power system
highly efficient
data sets
image processing
high level
pattern recognition
multi modal
diffusion process