Login / Signup
AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction.
Jiuxin Lin
Xinyu Cai
Heinrich Dinkel
Jun Chen
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Zhiyong Wu
Yujun Wang
Helen Meng
Published in:
CoRR (2023)
Keyphrases
</>
audio visual
multi modal
visual information
speaker verification
emotion recognition
temporal context
visual data
multi stream
information extraction
person authentication
multimedia
three dimensional
pattern recognition
audio features