Sign in
AFL-Net: Integrating Audio, Facial, and Lip Modalities with Cross-Attention for Robust Speaker Diarization in the Wild.
Yongkang Yin
Xu Li
Ying Shan
Yuexian Zou
Published in:
CoRR (2023)
Keyphrases
</>
speaker diarization
audio stream
speaker identification
facial features
emotion recognition
broadcast news
multimedia
face recognition
visual data
information retrieval
facial expressions
human computer interaction
speech recognition
facial images
noisy environments