Self-Supervised Audio-Visual Speech Representations Learning by Multimodal Self-Distillation.

Published in: ICASSP (2023)

Keyphrases