Publication: Large-scale unsupervised audio pre-training for video-to-speech synthesis.