Transfer Learning from Audio-Visual Grounding to Speech Recognition.
Wei-Ning HsuDavid HarwathJames R. GlassPublished in: INTERSPEECH (2019)
Keyphrases
- transfer learning
- audio visual
- speech recognition
- audio visual speech recognition
- multi modal
- hidden markov models
- reinforcement learning
- visual information
- pattern recognition
- multi stream
- machine learning
- language model
- semi supervised learning
- multimedia
- labeled data
- active learning
- noisy environments
- collaborative filtering
- automatic speech recognition
- speech signal
- visual data
- text categorization
- text classification
- speaker identification
- learning algorithm
- decision trees
- eye movements
- bayesian networks
- maximum likelihood
- text mining
- natural language processing
- audio features
- recommender systems