ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer.
Huadai LiuRongjie HuangXuan LinWenqiang XuMaozong ZhengHong ChenJinzheng HeZhou ZhaoPublished in: CoRR (2023)
Keyphrases
- text to speech
- speech synthesis
- programming tool
- prosodic features
- text to speech synthesis
- visual information
- multimodal interaction
- word processing
- anisotropic diffusion
- low level
- english text
- fault diagnosis
- visual features
- visual speech
- power transformers
- image processing
- diffusion process
- genetic algorithm
- pattern recognition