ASR is all you need: cross-modal distillation for lip reading.
Triantafyllos AfourasJoon Son ChungAndrew ZissermanPublished in: CoRR (2019)
Keyphrases
- cross modal
- lip reading
- automatic speech recognition
- multi modal
- speaker identification
- head tracking
- speech recognition
- speech signal
- noisy environments
- expression recognition
- multimedia retrieval
- visual recognition
- visual data
- multimedia databases
- image retrieval
- multiscale
- facial expressions
- visual similarity
- hidden markov models
- object recognition