M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis.
Jinlong XueYayue DengFengping WangYa LiYingming GaoJianhua TaoJianqing SunJiaen LiangPublished in: CoRR (2023)
Keyphrases
- multi modal
- end to end
- text to speech synthesis
- multiscale
- text to speech
- edge detection
- wavelet transform
- congestion control
- high dimensional
- multi modality
- image processing
- audio visual
- admission control
- wireless ad hoc networks
- cross modal
- application layer
- image representation
- image segmentation
- scalable video
- wavelet coefficients