Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos.

Benet Oriol Jordi Luque Ferran Diego Xavier Giró-i-Nieto

Published in: CoRR (2020)

Keyphrases

natural language descriptions
input image
image database
ground truth
image collections
three dimensional
image data
image retrieval
image analysis
image registration
video images
segmentation algorithm
image classification
image features
photo collections
symbolic descriptions
edge detection
static images
image annotation
segmentation method
textual descriptions
hidden markov models
image description
object recognition
video sequences