Cantonese neural speech synthesis from found newscasting video data and its speaker adaptation.
Raymond ChungPublished in: ISCSLP (2022)
Keyphrases
- video data
- speech synthesis
- speech recognition
- speaker adaptation
- vocal tract
- speaker independent
- video sequences
- video analysis
- speech recognizer
- video streams
- automatic speech recognition
- video database
- language model
- hidden markov models
- video frames
- video content
- neural network
- multimedia
- pattern recognition
- noisy environments
- video retrieval
- visual data
- speech signal
- video clips
- video indexing
- key frames
- image processing
- video shots
- speaker identification
- text to speech
- image classification
- natural language processing
- image sequences