Audio-visual feature integration based on piecewise linear transformation for noise robust automatic speech recognition.
Yosuke KashiwagiMasayuki SuzukiNobuaki MinematsuKeikichi HirosePublished in: SLT (2012)
Keyphrases
- audio visual
- linear transformation
- automatic speech recognition
- noisy environments
- speech recognition
- linear model
- multi modal
- speech signal
- broadcast news
- noise reduction
- hidden markov models
- visual information
- visual data
- multimedia
- image features
- metric learning
- distance metric
- image classification
- face recognition
- audio features
- sound source
- information retrieval