Login / Signup

CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization.

Haodong ZhouTao LiJie WangLin LiQingyang Hong
Published in: APSIPA ASC (2023)
Keyphrases
  • end to end
  • audio visual
  • multi modal
  • multi stream
  • low level
  • natural language processing
  • automatic speech recognition
  • speaker diarization