Login / Signup
Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction.
Zexu Pan
Gordon Wichern
Yoshiki Masuyama
François G. Germain
Sameer Khurana
Chiori Hori
Jonathan Le Roux
Published in:
CoRR (2023)
Keyphrases
</>
audio visual
multi modal
visual information
multi stream
visual data
co occurrence
emotion recognition
audio features
speaker verification
person authentication
multimedia
audio visual speech recognition
spatio temporal
domain knowledge
information extraction