Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling.
Quanxiu WangHui HuangMingjie WangYong DaiJinzuomu ZhongBenlai TangPublished in: CoRR (2024)
Keyphrases
- text to speech
- multiscale
- prosodic features
- speech synthesis
- multimedia
- information retrieval
- keywords
- prior knowledge
- text mining
- text graphics
- text retrieval
- multiple scales
- back end
- database
- user friendly
- multi modal
- text data
- training process
- audio visual
- training corpus
- text documents
- web documents
- training samples
- spoken documents