Login / Signup
CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization.
Haodong Zhou
Tao Li
Jie Wang
Lin Li
Qingyang Hong
Published in:
APSIPA ASC (2023)
Keyphrases
</>
end to end
audio visual
multi modal
multi stream
low level
natural language processing
automatic speech recognition
speaker diarization