Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation.

Published in: CoRR (2022)

Keyphrases