Login / Signup
Leveraging Acoustic Contextual Representation by Audio-textual Cross-modal Learning for Conversational ASR.
Kun Wei
Yike Zhang
Sining Sun
Lei Xie
Long Ma
Published in:
INTERSPEECH (2022)
Keyphrases
</>
cross modal
perceptual information
multi modal
visual recognition
learning algorithm
learning tasks
multimedia
visual information
visual data
natural language
query processing
image classification
image representation
visual content