AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR.

Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid

Published in: CoRR (2023)

Keyphrases

speech recognition
audio visual
automatic speech recognition
statistical models
data sets
probabilistic model
information retrieval
computer vision
image processing
case study
model selection
computational models
endpoint detection