Speech recognition based on Itakura-Saito divergence and dynamics/sparseness constraints from mixed sound of speech and music by non-negative matrix factorization.
Naoaki HashimotoShoichi NakanoKazumasa YamamotoSeiichi NakagawaPublished in: INTERSPEECH (2014)
Keyphrases
- speech recognition
- negative matrix factorization
- relative entropy
- bregman divergences
- speech signal
- speech synthesis
- automatic speech recognition
- speech recognizer
- hidden markov models
- nonnegative matrix factorization
- pattern recognition
- speech processing
- speaker identification
- matrix factorization
- language model
- speech recognition systems
- acoustic features
- document clustering
- principal component analysis
- audio signal
- speech recognition technology
- sparse representation
- information theoretic
- isolated word
- sound source
- speech recognizers
- neural network
- noisy environments
- missing data
- feature extraction
- image processing
- speaker independent
- speaker adaptation
- noisy speech
- mahalanobis distance
- speaker dependent
- music information retrieval
- data mining