A DNN-HMM-DNN Hybrid Model for Discovering Word-Like Units from Spoken Captions and Image Regions.

Liming Wang Mark Hasegawa-Johnson

Published in: INTERSPEECH (2020)

Keyphrases

hybrid model
image regions
hidden markov models
speech recognition
training process
image content
hybrid models
artificial neural networks
image features
back propagation neural network
support vector regression
image data
low level
co occurrence
visual content
region features
support vector machine svm
visual features
support vector machine
long term