Improving Audio-visual Speech Recognition Performance with Cross-modal Student-teacher Training.
Wei LiSicheng WangMing LeiSabato Marco SiniscalchiChin-Hui LeePublished in: ICASSP (2019)
Keyphrases
- cross modal
- audio visual speech recognition
- multi modal
- student teachers
- multi stream
- teacher training
- audio visual
- multimedia retrieval
- training set
- visual recognition
- high dimensional
- visual similarity
- visual data
- machine learning
- online learning
- text classification
- feature vectors
- image retrieval
- feature extraction