Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation.

Nobuyuki Morioka Heiga Zen Nanxin Chen Yu Zhang Yifan Ding

Published in: CoRR (2022)

Keyphrases

text to speech
speaker adaptation
speech synthesis
speech recognition
vocal tract
maximum likelihood
automatic speech recognition
prosodic features
speaker dependent
video sequences
programming tool
speech recognizer
video shots
text to speech synthesis
word processing
video content
speaker independent
video data
key frames
pattern recognition
neural network
speech signal
image acquisition
image processing
computer vision