Login / Signup
Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.
Ruijie Tao
Zexu Pan
Rohan Kumar Das
Xinyuan Qian
Mike Zheng Shou
Haizhou Li
Published in:
CoRR (2021)
Keyphrases
</>
audio visual
long term
multi modal
visual information
multimedia
speaker verification
emotion recognition
person authentication
visual data
audio features
temporal context
multi stream
high level
image data
low level