Login / Signup
Scenario-Aware Audio-Visual TF-Gridnet for Target Speech Extraction.
Zexu Pan
Gordon Wichern
Yoshiki Masuyama
François G. Germain
Sameer Khurana
Chiori Hori
Jonathan Le Roux
Published in:
ASRU (2023)
Keyphrases
</>
audio visual
multi modal
emotion recognition
visual data
visual information
multi stream
multimedia
person authentication
information extraction
co occurrence
audio features
speaker verification
audio visual speech recognition
feature selection
image sequences
text to speech