AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR.
Paul Hongsuck SeoArsha NagraniCordelia SchmidPublished in: CVPR (2023)
Keyphrases
- automatic speech recognition
- speech recognition
- experimental data
- vision system
- data sets
- audio visual
- model selection
- computer vision
- complex systems
- noisy environments
- real time
- speech signal
- machine learning
- statistical models
- information retrieval
- probabilistic model
- prior knowledge
- natural language
- non stationary
- image processing
- computational models
- mathematical models
- multi modal
- multimedia