Sign in

Listen, Look and Deliberate: Visual Context-Aware Speech Recognition Using Pre-Trained Text-Video Representations.

Shahram GhorbaniYashesh GaurYu ShiJinyu Li
Published in: SLT (2021)
Keyphrases