WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning.
Krishna SrinivasanKarthik RamanJiecao ChenMichael BenderskyMarc NajorkPublished in: CoRR (2021)
Keyphrases
- machine learning
- image data
- image dataset
- image content
- single image
- multiscale
- image features
- segmentation method
- image representation
- image collections
- test images
- image analysis
- image set
- input image
- image classification
- low level
- image segmentation
- edge detection
- supervised machine learning
- segmentation algorithm
- semantic information
- image retrieval
- computer vision
- world knowledge
- text retrieval
- street view
- web images
- multi modal
- text classification
- object recognition
- keywords
- natural language processing
- high resolution
- link structure
- digital libraries
- multi lingual
- image sequences
- wikipedia pages