Deep Multimodal Semantic Embeddings for Speech and Images.
David F. HarwathJames R. GlassPublished in: CoRR (2015)
Keyphrases
- image data
- three dimensional
- object recognition
- image retrieval
- input image
- image database
- semantically meaningful
- ground truth
- image features
- semantic categories
- image analysis
- image collections
- image classification
- test images
- visual concepts
- high level
- image annotation
- feature points
- edge detection
- speech recognition
- region of interest
- image set
- low level features
- audio visual
- low level
- video sequences
- similarity measure
- semantic classes