Publication: Deep multimodal semantic embeddings for speech and images.