Login / Signup
Audio-Visual Cross-Attention Network for Robotic Speaker Tracking.
Xinyuan Qian
Zhengdong Wang
Jiadong Wang
Guohui Guan
Haizhou Li
Published in:
IEEE ACM Trans. Audio Speech Lang. Process. (2023)
Keyphrases
</>
audio visual
multi modal
audio visual speech recognition
speaker verification
visual information
multi stream
multimedia
person authentication
temporal context
visual data
emotion recognition
data sets
audio features
image classification