Deep multimodal semantic embeddings for speech and images.
David F. HarwathJames R. GlassPublished in: ASRU (2015)
Keyphrases
- image data
- image database
- image analysis
- ground truth
- input image
- image collections
- three dimensional
- audio visual
- image classification
- image retrieval
- audio signals
- region of interest
- image features
- object recognition
- edge detection
- computer vision
- test images
- image registration
- semantic categories
- face recognition
- image regions
- image matching
- lighting conditions
- semantic classes
- dimensionality reduction
- visual information
- image content
- segmentation algorithm
- human computer interaction
- similarity measure
- image processing