Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources.
Angelina McMillan-MajorZaid AlyafeaiStella BidermanKimbo ChenFrancesco De ToniGérard DupontHady ElsaharChris EmezueAlham Fikri AjiSuzana IlicNurulaqilla KhamisColin LeongMaraim MasoudAitor SoroaPedro Javier Ortiz SuárezZeerak TalatDaniel van StrienYacine JernitePublished in: CoRR (2022)
Keyphrases
- data sets
- database
- databases
- high quality
- synthetic data
- original data
- data processing
- image data
- small number
- computing resources
- complex data
- raw data
- data mining techniques
- data sources
- input data
- knowledge discovery
- computer systems
- end users
- spatial data
- experimental data
- data distribution
- prior knowledge
- data objects
- association rules
- data structure