Semantic retrieval of personal photos using a deep autoencoder fusing visual features with speech annotations represented as word/paragraph vectors.
Hung-tsung LuYuan-ming LiouHung-yi LeeLin-Shan LeePublished in: INTERSPEECH (2015)
Keyphrases
- semantic retrieval
- visual features
- image annotation
- personal photos
- automatic annotation
- keywords
- web images
- image search
- visual information
- image classification
- visual content
- image collections
- image retrieval
- low level features
- low level
- audio visual
- photo collections
- semantic gap
- semantic concepts
- image understanding
- key frames
- object categories
- multi modal
- visual concepts
- image sequences
- object detection
- image features
- visual data
- higher level
- search engine
- feature space
- multiscale