AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR.

Paul Hongsuck Seo Arsha Nagrani Cordelia Schmid

Published in: CVPR (2023)

Keyphrases

automatic speech recognition
speech recognition
experimental data
vision system
data sets
audio visual
model selection
computer vision
complex systems
noisy environments
real time
speech signal
machine learning
statistical models
information retrieval
probabilistic model
prior knowledge
natural language
non stationary
image processing
computational models
mathematical models
multi modal
multimedia