Publication: Cross-Modal Semantic Alignment before Fusion for Two-Pass End-to-End Spoken Language Understanding.