A Large-scale Dataset of (Open Source) License Text Variants.
Stefano ZacchiroliPublished in: CoRR (2022)
Keyphrases
- open source
- source code
- open source software
- software package
- case study
- real life
- text mining
- information retrieval
- text retrieval
- natural language generation
- database
- synthetic datasets
- small scale
- open source projects
- key concepts
- benchmark datasets
- string matching
- real world
- automatically extracted
- data analytics
- million images
- textual data
- training dataset
- text data
- free text
- text classification
- machine learning
- data sets