Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
Naoki MakishimaSatoshi SuzukiAtsushi AndoRyo MasumuraPublished in: INTERSPEECH (2022)
Keyphrases
- text data
- step wise
- semi supervised
- automatic speech recognition
- text mining
- text classification
- speech recognition
- high dimensional
- supervised learning
- structured data
- text documents
- document collections
- text to speech
- semi supervised learning
- natural language
- real world
- unlabeled data
- high dimensional data
- prosodic features
- labeled data
- unsupervised learning
- information retrieval
- information extraction
- knowledge discovery
- active learning
- association rules
- speech synthesis
- metadata