OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents.
Hugo LaurençonLucile SaulnierLéo TronchonStas BekmanAmanpreet SinghAnton LozhkovThomas WangSiddharth KaramchetiAlexander M. RushDouwe KielaMatthieu CordVictor SanhPublished in: NeurIPS (2023)
Keyphrases
- web scale
- million images
- text documents
- web images
- text mining
- image content
- image retrieval
- image data
- image features
- text classification
- image search
- image classification
- image representation
- text categorization
- low level
- image set
- multiscale
- keywords
- information extraction
- image regions
- image collections
- image segmentation
- wordnet
- image annotation
- visual information
- named entities
- database
- data sets
- supervised learning
- high level