Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation.

Published in: CoRR (2023)

Keyphrases