Masked cross self-attention encoding for deep speaker embedding.

Soonshin Seo Daniel Jun Rim Junseok Oh Ji-Hwan Kim

Published in: CoRR (2020)

Keyphrases

vector space
machine learning
multi modal
data points
speech recognition
visual attention
data hiding
focus of attention
encoding scheme
speaker recognition
encoding schemes
automatic transcription