Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data.
Naoki MakishimaSatoshi SuzukiAtsushi AndoRyo MasumuraPublished in: CoRR (2022)
Keyphrases
- text data
- step wise
- semi supervised
- automatic speech recognition
- text mining
- text classification
- speech recognition
- supervised learning
- high dimensional
- structured data
- semi supervised learning
- labeled data
- document collections
- text documents
- training set
- text to speech
- high dimensional data
- data sets
- prosodic features
- unlabeled data
- unsupervised learning
- active learning
- speech signal
- decision trees
- information retrieval
- image classification
- question answering
- information extraction
- pairwise
- audio visual
- clustering algorithm
- speech synthesis
- web pages