CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens.
Zhihao DuQian ChenShiliang ZhangKai HuHeng LuYexin YangHangrui HuSiqi ZhengYue GuZiyang MaZhifu GaoZhijie YanPublished in: CoRR (2024)
Keyphrases
- text to speech
- speech synthesis
- text to speech synthesis
- prosodic features
- english text
- digital libraries
- programming tool
- semantic web
- learning algorithm
- semantic annotation
- supervised learning
- natural language
- domain specific
- semantic network
- high level
- word processing
- feature selection
- semantic information
- unsupervised learning
- semi supervised
- cross language information retrieval
- cross language
- text retrieval
- language resources