Publication: Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations.