Login / Signup
Av-Sepformer: Cross-Attention Sepformer for Audio-Visual Target Speaker Extraction.
Jiuxin Lin
Xinyu Cai
Heinrich Dinkel
Jun Chen
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Zhiyong Wu
Yujun Wang
Helen Meng
Published in:
ICASSP (2023)
Keyphrases
</>
text mining
audio visual
information extraction
multi modal
speaker verification
emotion recognition
visual information
person authentication
multimedia
multi stream
visual data
temporal context
audio visual speech recognition
domain knowledge
pattern recognition
audio features
training set
video sequences