A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions.
Liming WangMark Hasegawa-JohnsonPublished in: INTERSPEECH (2020)
Keyphrases
- hybrid model
- image regions
- hidden markov models
- speech recognition
- training process
- image content
- hybrid models
- artificial neural networks
- image features
- back propagation neural network
- support vector regression
- image data
- low level
- co occurrence
- visual content
- region features
- support vector machine svm
- visual features
- support vector machine
- long term