Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and Videos.
Benet OriolJordi LuqueFerran DiegoXavier Giró-i-NietoPublished in: CoRR (2020)
Keyphrases
- natural language descriptions
- input image
- image database
- ground truth
- image collections
- three dimensional
- image data
- image retrieval
- image analysis
- image registration
- video images
- segmentation algorithm
- image classification
- image features
- photo collections
- symbolic descriptions
- edge detection
- static images
- image annotation
- segmentation method
- textual descriptions
- hidden markov models
- image description
- object recognition
- video sequences