Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition.

Published in: CoRR (2024)

Keyphrases