Login / Signup
Multi-Modal Perception Attention Network with Self-Supervised Learning for Audio-Visual Speaker Tracking.
Yidi Li
Hong Liu
Hao Tang
Published in:
AAAI (2022)
Keyphrases
</>
audio visual
multi modal
audio visual speech recognition
multi stream
speaker verification
visual information
multi modality
high level
image data
image annotation
video search
audio features