Transfer Learning from Audio-Visual Grounding to Speech Recognition.
Wei-Ning HsuDavid F. HarwathJames R. GlassPublished in: CoRR (2019)
Keyphrases
- transfer learning
- audio visual
- speech recognition
- multi modal
- audio visual speech recognition
- visual information
- labeled data
- language model
- hidden markov models
- visual data
- active learning
- multimedia
- speech signal
- reinforcement learning
- collaborative filtering
- text classification
- pattern recognition
- automatic speech recognition
- multi stream
- text categorization
- noisy environments
- machine learning
- semi supervised learning
- computer vision
- learning algorithm
- semantic information
- audio features
- speaker identification
- unlabeled data
- text mining
- semi supervised