Croissant: A Metadata Format for ML-Ready Datasets.
Mubashara AkhtarOmar BenjellounCostanza ConfortiJoan Giner-MiguelezNitisha JainMichael KuchnikQuentin LhoestPierre MarcenacManil MaskeyPeter MattsonLuis OalaPierre RuyssenRajat ShindeElena SimperlGoeffry ThomasSlava TykhonovJoaquin VanschorenSteffen VoglerCarole-Jean WuPublished in: CoRR (2024)
Keyphrases
- metadata
- multimedia
- digital libraries
- maximum likelihood
- databases
- dublin core
- learning objects
- data sets
- social networks
- open access
- metadata elements
- database
- metadata extraction
- metadata standards
- digital documents
- xml format
- electronic documents
- synthetic and real datasets
- data repositories
- multimedia content
- spatial data
- real world