Deep Multimodal Semantic Embeddings for Speech and Images.

David F. Harwath James R. Glass

Published in: CoRR (2015)

Keyphrases

image data
three dimensional
object recognition
image retrieval
input image
image database
semantically meaningful
ground truth
image features
semantic categories
image analysis
image collections
image classification
test images
visual concepts
high level
image annotation
feature points
edge detection
speech recognition
region of interest
image set
low level features
audio visual
low level
video sequences
similarity measure
semantic classes