Audio-visual speech synthesis using vision transformer-enhanced autoencoders with ensemble of loss functions.

Published in: Appl. Intell. (2024)

Keyphrases