CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization.

Published in: APSIPA ASC (2023)

Keyphrases